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TITLE 

PLANT GENE FOR /M4YDROXYPHENYLPYRUVATE DIOXYGENASE 

FIELD OF THE INVENTION 
This invention relates to the isolation and modification of nucleic acid 
5 encoding p-hydroxyphenylpyruvate dioxygenase enzyme from plants. These 
nucleic acid sequences were used to establish methods of identification of new 
herbicidal compounds that inhibit the activity of this enzyme, and to prepare new- 
crop plants that are tolerant to the herbicidal action of inhibitors this enzyme. 
Chimeric genes comprising nucleic acid fragments containing all or part of the 
10 nucleic acid sequences encoding />hydroxyphenyipyruvate dioxygenase may be 
used to produce active plant /j-hydroxyphenylpyruvate dioxygenase enzyme in 
microorganisms, and to cause the production of modified forms of the enzyme in 
plants that may render such plants tolerant to inhibitors of the enzyme. 

BACKGROUND OF THE INVENTION 
1 5 Bleaching herbicides affect plant chloroplasts by decreasing their 

chlorophyll and carotenoid content. Several bleaching herbicides are known to 
inhibit the enzyme phytoene desaturase. resulting in the accumulation of phytoene 
in treated plants. However, compounds of the benzoyl cyclohcxane-L3-dione 
type cause the accumulation of phytoene in plants but are not inhibitors of 
20 phytoene desaturase in vitro (Sandmann, G.. et al. (1990) Pestic. Sci. 30:353-355). 
Subsequent work revealed that these compounds are effective inhibitors of 
/7-hydroxyphenylpyruvate dioxygenase (/?-hydroxyphenylpyruvatc:oxygen 
oxidoreductase EC 1 . 1 3 . 1 1 .27), a key enzyme in the biosynthesis of 
plastoquinones and tocopherols (Schulz, A., et al. (1993) FEES Lett. 
25 3 18: 162-166). Based on the observation that phytoene desaturase requires a 
quinone as an electron acceptor, these authors postulated that by inhibiting 
/7-hydroxyphenylpyruvate dioxygenase. these herbicides act indirectly on 
phytoene desaturase by blocking the biosynthesis of quinones. 

The proposal that /j-hydroxyphenylpyruvate dioxygenase is essential for 
30 carotenoid biosynthesis has received support from genetic studies in the plant 

model system Arabidopsis thaliana. Mutations in the pdsl and pds2 genetic loci 
result in mutant plants that accumulate phytoene. However, genetic mapping of 
these mutant genes indicates that they do not correspond to the gene encoding the 
enzyme phytoene desaturase. The pdsl mutation can be rescued by homogentisic 
35 acid, the substrate of ^-hydroxypheny (pyruvate dioxygenase. Therefore, this 
mutation corresponds to a defect in the activity of /?-hydroxypheny [pyruvate 
dioxygenase (Norris, S. R., et al. (1995) Plant Cell 7:2139-2149). 
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In light of these disclosures, /;-hydroxyphenylpyruvate dioxvuenase is a 
promising new target for new herbicidai compounds. Research aimed at 
discovering new herbicides based on this mode of action would be greatly 
facilitated by the isolation of the plant gene encoding this enzvme and bv the 
> functional expression of this gene in transgenic organisms. For example, active 
enzyme produced in recombinant microorganisms could be used to establish 
screening methods for the identification of novel active compounds and to obtain 
structural and mechanistic information useful to guide further chemical synthesis 
Furthermore, isolation of this gene would facilitate research aimed at generating 
1 0 mutant, herbicide-tolerant versions of the enzyme that may confer herbicide 
resistance to transgenic plants. 

A partial sequence of an Arabidopsis thaliana cDNA with homology to 
corresponding mammalian sequences encoding /Hwdroxyphenyipvruvate 
dioxygenase has been identified (GenBank Accession No. T20952). but this 
1 5 truncated sequence is insufficient to identity an active plant ,,-hvdroxvphenyl- 
pyruvate dioxygenase. WO 96/38567 A2 addresses the utility that would be 
attached to a DNA sequence of a />-hydroxyphenyipyruvate dioxygenase ,ene. but 
there is no biochemical evidence of function associated with the sequences 
disclosed. 

20 SUMMARY OF THR INVENTION 

This invention pertains to the isolation and characterization of nucleic acid 
fragments encoding plant p-hydroxyphenylpyruvate dioxygenase enzymes. More 
specifically, this invention pertains to isolated nucleic acid fragments encoding the 
p-hydroxvphenylpyruvate dioxygenase enzymes from Arabidopsis thaliana and 

25 Zea mays. 

This invention also pertains to the production of active plant p-hydroxy- 
phenylpyruvate dioxygenase enzyme in E coli. In one embodiment, a chimeric 
gene comprising a nucleic acid fragment encoding a polypeptide that possesses 
/>hydroxyphenyl P yruvate dioxygenase activity, operably linked to regulatory 

30 sequences that direct gene expression in E. coli, is claimed. In another 

embodiment, a plasmid vector comprising said chimeric gene is disclosed. In vet 
another embodiment, a transformed E. coli comprising a chimeric gene consisting 
of a nucleic acid fragment encoding a polypeptide that possesses p-hydroxy- 
pheny I pyruvate dioxygenase activity is disclosed. 

35 This invention also pertains to a method of identifying substances that 

inhibit the rate of the reaction ofp-hydroxyphenylpyruvate dioxygenase enzyme. 
In one embodiment, the invention pertains to an assay for the detection of 
inhibitors of>hydroxyphenylpyruvate dioxygenase wherein a polypeptide 
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derived from a transformed E. coli that displays /?-hydroxyphenylpyruvate 
dioxygenase activity is incubated in the presence of a test substance. Following 
incubation. />hydroxyphenylpyruvate dioxygenase enzymatic activity is measured 
wherein a reduction of enzymatic activity is indicative of the inhibitory capacity 
of the test substance. Enzymatic activity can be measured by any appropriate 
means, including but not limited to oxygen utilization, carbon dioxide release, 
homogentisate production, and loss of p-hydroxyphenylpyruvate. Results are 
quantified by radiometric, colorimctric or chromatographic means. 

In another embodiment, this invention pertains to plants that are 
substantially tolerant to the application of at least one compound that inhibits the 
rate of the reaction of /?-hydroxyphcnylpyruvate dioxygenase. Plants may be 
rendered tolerant by overexpression of the wild-type p-hydroxyphenylpyruvate 
dioxygenase, by expression of a naturally-occuring resistant variant of this 
enzyme, or by expression of an altered form of /?-hydroxyphenylpyruvate 
dioxygenase that is resistant to the action of compounds that are inhibitory to the 
wild-type enzyme. 

A further embodiment of the invention is an isolated nucleic acid fragment 
comprising a member selected from the group consisting of: 

(a) an isolated nucleic acid fragment as set forth in SEQ ID NO: 1 6; 

(b) an isolated nucleic acid fragment that is essentially similar to an 
isolated nucleic acid fragment as set forth in SEQ ID NO: 16; 
and 

(c) an isolated nucleic acid fragment that is complementary to (a) or 



BRIEF DESCRIPTION OF THE 
DRAWINGS AND SEQUENCE DESCRIPTIONS 
The invention can be more fully understood from the following detailed 
description and the accompanying drawings and the sequence descriptions which 
form a part of this application. 

Figure 1 presents a partial nucleic acid sequence of an expressed sequence 
tag (EST) bearing GenBank Accession No. T92052 obtained from an Arahidopsis 
thalxana cDNA library. This sequence was contained in clone 91 B 1 3T7 of the 
library. 

Figure 2 presents the nucleic acid sequence of the cloned cDNA encoding a 
full-length form of Arahidopsis thaliana /?-hydroxyphenylpyruvate dioxygenase 
enzyme, as it was initially determined (SEQ ID NO:2). Translation start and stop 
codons are underlined. Selected restriction sites are indicated. 
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Figure 3 presents the amino acid sequence comparison between full-length 
p-hydroxyphenylpyruvate dioxygenases from Arabidopsis thaliana (SEQ ID 
NO: 1 5) and Zea mays (SEQ ID NO: 1 1 ) and the /?-hydroxyphenylpyruvate 
dioxygenase enzymes derived from human (SEQ ID NO:6. GenBank Acc. 
5 No. U29895), pig (SEQ ID NO:7. GenBank Acc. No. DI 3390), mouse (SEQ ID 
NO:8, GenBank Acc. No. D29987) and rat (SEQ ID NO:9, GenBank Acc. 
No. Ml 8405). Asterisks indicate amino acid residues that are conserved across all 
six species. This figure was created using the Pileup program of GCG (Program 
Manual for the Wisconsin Package, Version 9.0-OpenVMS. December 1996, 
1 0 Genetics Computer Group, 575 Science Drive, Madison. WI, USA 53711). 

Figure 4 is a diagram describing the construction of the intermediate 
plasmid vector pT7BlueR + PDOl. 

Figure 5 is a diagram describing the construction of E. coli expression 
vector pE24CPl. 

1 5 Applicants have provided a sequence listing in conformity with "Rules for 

the Standard Representation of Nucleotide and Amino Acid Sequences in Patent 
Applications 11 (Annexes I and II to the Decision of the President of the EPO ? 
published in Supplement No. 2 to OJ EPO, 12/1992) and with 37 C.F.R. 
1.821-1.825 and Appendices A and B ("Requirements for Application Disclosures 

20 Containing Nucleotides and/or Amino Acid Sequences' 1 ). 

SEQ ID NO:l presents a partial nucleic acid sequence of an expressed 
sequence tag (EST) bearing GenBank Accession No. T92052 obtained from an 
Arabidopsis thaliana cDNA library. This sequence was contained in clone 
91B13T7 of the library. 

25 SEQ ID NO:2 presents the initial determination of the nucleic acid sequence 

and the deduced amino acid sequence of a cDNA encoding a full-length form of 
Arabidopsis thaliana />-hydroxyphenyipyruvate dioxygenase enzyme, as 
contained in plasmid pGBPPD2. 

SEQ ID NO:3 presents the initially deduced amino acid sequence encoded 

30 by a cDNA for Arabidopsis thaliana /?-hydroxyphenylpyruvate dioxygenase 
enzyme. 

SEQ ID NOS:4 and 5 present the nucleotide sequences of a pair of 
complementary oligonucleotides (CAM 32 and CAM 33, respectively) used to 
facilitate subcloning and expression of the gene encoding /?-hydroxyphenyI- 
35 pyruvate dioxygenase without the chloroplast transit sequence. 

SEQ ID NO:6 presents the amino acid sequence of />hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from human (GenBank Acc. No. U29895). 
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SEQ ID NO:7 presents the amino acid sequence of />hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from pig (GenBank Acc. No. D 13390). 

SEQ ID NO:8 presents the amino acid sequence of />hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from mouse (GenBank Acc. No. D29987). 
5 SEQ ID NO:9 presents the amino acid sequence of /?-hydroxyphenyl- 

pyruvate dioxygenase enzyme derived from rat (GenBank Acc. No. Ml 8405). 

SEQ ID NO: 10 presents the nucleic acid sequence and deduced amino acid 
sequence of the cloned cDNA encoding the Zea mays /?-hydroxyphenyIpyruvate 
dioxygenase enzyme, as contained in plasmid pMPDO, 
10 SEQ ID NO:l 1 presents the deduced amino acid sequence of the cloned 

cDNA encoding the Zea mays p-hydroxyphenylpyruvate dioxygenase enzyme, as 
contained in plasmid pMPDO. 

SEQ ID NO: 12 presents the nucleic acid sequence and the deduced amino 
acid sequence of the truncated form of Arabidopsis thaliana /^hydroxyphenyl- 
1 5 pyruvate dioxygenase enzyme as contained in pE24CPl . 

SEQ ID NO: 13 presents the deduced amino acid sequence of the truncated 
form of Arabidopsis thaliana /?-hydroxypheny]pyruvate dioxygenase enzyme as 
contained in pE24CPl. 

SEQ ID NO: 14 presents the revised nucleic acid sequence and the deduced 
20 amino acid sequence of the cloned cDNA encoding the full-length Arabidopsis 

thaliana /7-hydroxyphenylpyruvate dioxygenase enzyme, as contained in plasmid 
pGBPPD2. 

SEQ ID NO: 15 presents the revised amino acid sequence deduced from the 
cDNA for the full length Arabidopsis thaliana p-hydroxyphenylpyruvate 
25 dioxygenase enzyme. 

SEQ ID NO: 16 presents the nucleic acid sequence determined from a 
portion of a cDN A from Vernonia galamenensis. as contained in clone 
vsl.pk0015.b2. 

DETAILS OF THE INVENTION 
30 BIOLOGICAL DEPOSITS 

The following biological materials have been deposited under the terms of 
the Budapest Treaty at American Type Culture Collection (ATCC), 12301 
Parklawn Drive, Rockville, MD 20852, and bear the following accession 
numbers: 
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Depositor identification Infl. Depository 

Host Strain Plasmid Accession Number Date of Denosit 

E. 6Y>//BL21(DE3) pE24CPl ATCC 98083 June 25. 1996 

N/A pGBPPD2 ATCC 97622 June 25, 1 996 

N/A pMPDO ATCC 209120 June 12, 1997 

Definitions 

In the context of this disclosure, a number of terms shall be utilized. As 
used herein, the term "nucleic acid" refers to a large molecule which can be 
5 single-stranded or double-stranded, composed of monomers (nucleotides) 

containing a sugar, phosphate and either a purine or pyrimidine. A "nucleic acid 
fragment 1 ' is a portion of a given nucleic acid molecule. As used herein. "DNA" 
(deoxyribonucleic acid) is the genetic material, whereas "RNA" (ribonucleic acid) 
is involved in the transfer of the information encoded by the DNA into proteins 

10 and polypeptides. A "genome" is the entire body of genetic material contained in 
each cell of an organism. The term "nucleotide sequence" refers to a polymer of 
DNA or RNA which can be single- or double-stranded, optionally containing 
synthetic, non-natural or altered nucleotide bases capable of incorporation into 
DNA or RNA polymers. 

15 As used herein, "essentially similar" refers to DNA sequences that may 

involve base changes that do not cause a change in the encoded amino acid or 
which involve base changes which may alter one or more ammo acids, but do not 
affect the functional properties of the protein encoded by the DNA sequence. It is 
therefore understood that the invention encompasses more than the specific 

20 exemplary sequences. Modifications to the sequence, such as deletions. 

insertions, or substitutions in the sequence which produce "silent changes" (i.e., 
those that do not substantially affect the functional properties of the resulting 
protein molecule) are also contemplated. For example, altcration(s) in the gene 
sequence which reflects the degeneracy of the genetic code, or which result in the 

25 production of a chemically equivalent amino acid at a given site, are 

contemplated; thus, a codon for the amino acid alanine, a hydrophobic amino acid, 
may be substituted by a codon encoding another less hydrophobic residue, such as 
glycine, or a more hydrophobic residue, such as valine, leucine, or isoleucine. 
Similarly, changes which result in substitution of one negatively charged residue 

30 for another, such as aspartic acid for glutamic acid, or one positively charged 

residue for another, such as lysine for arginine, can also be expected to produce a 
biologically equivalent product. Nucleotide changes which result in alteration of 
the N-terminal and C-terminal portions of the protein molecule would also not be 
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expected to alter the activity of the protein. In some cases, it may in fact be 
desirable to make mutants of the sequence in order to study the effect of alteration 
on the biological activity of the protein. Each of the proposed modifications is 
well within the routine skill in the art, as is determination of retention of 
5 biological activity of the encoded products. Moreover, the skilled artisan 

recognizes that "essentially similar" sequences encompassed by this invention arc 
also defined by their ability to hybridize, under stringent conditions (0.1 X SSC. 
0.1% SDS, 65°C), with the sequences exemplified herein. 

"Gene" refers to a nucleic acid fragment that encodes a specific protein, 

10 including regulatory sequences preceding (5' non-coding) and following (3' non- 
coding) the coding region. "Native" gene refers to the gene as found in nature 
with its own regulatory sequences. "Chimeric" gene refers to a gene comprising 
heterogeneous regulatory and coding sequences. "Endogenous" gene refers to the 
native gene normally found in its natural location in the genome. A "foreign" 

I 5 gene refers to a gene not normally found in the host organism but that is 
introduced by gene transfer. 

"Coding sequence" refers to a DNA sequence that codes for a specific 
protein and excludes the non-coding sequences. 

"Initiation codon" and "termination codon" refer to a unit of three adjacent 

20 nucleotides in a coding sequence that specifies initiation and termination. 

respectively, of protein synthesis (mRNA translation). "Open reading frame" 
refers to the amino acid sequence encoded between translation initiation and 
termination codons of a coding sequence. 

"RNA transcript* refers to the product resulting from RNA polymerase- 

25 catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect 
complementary copy of the DNA sequence, it is referred to as the primary 
transcript or it may be a RNA sequence derived from posttranscriptional 
processing of the primary transcript. "Messenger RNA" (mRNA) refers to RNA 
that can be translated into protein by the cell. "cDNA" refers to a double-stranded 

30 DNA, one strand of which is complementary to and derived from mRNA by 
reverse transcription. "Sense RNA" refers to RNA transcript that includes the 
mRNA. 

As used herein, "regulatory sequences" are nucleotide sequences that control 
the transcription or expression of a coding sequence located upstream (5'), within, 
35 or downstream (3') to the coding sequence, act in conjunction with the protein 
biosynthetic apparatus of the cell and include promoters, translation leader 
sequences, transcription termination sequences, and polyadenylation sequences. 
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"Promoter" refers to a DNA sequence in a gene, usually upstream (5*) to its 
coding sequence, which controls the expression of the coding sequence by 
providing the recognition for RNA polymerase and other factors required for 
proper transcription. A promoter may also contain DNA sequences that are 

5 involved in the binding of protein factors which control the effectiveness of 

transcription initiation in response to physiological or developmental conditions. 
In the case of eukaryotic organisms, it may also contain enhancer elements. 

An "enhancer element" is a DNA sequence which can stimulate promoter 
activity. It may be an innate element of the promoter or a heterologous element 

10 inserted to enhance the activity level and tissue-specificity of a promoter. 
"Constitutive promoters" refer to those enhancer elements that direct gene 
expression in all tissues and at all times. "Organ-specific" or "development- 
specific" promoters as referred to herein are those that direct gene expression 
almost exclusively in specific organs, such as leaves or seeds, or at specific 

1 5 development stages in an organ, such as in early or late embryogenesis, 
respectively. 

The term "operably linked" refers to nucleic acid sequences on a single 
nucleic acid molecule which are associated so that the function of one is affected 
by the other. For example, a promoter is operabty linked with a structural gene 
20 (i.e., a gene encoding p-hydroxyphcnylpyruvate dioxygenase. as disclosed herein) 
when it is capable of affecting the expression of that structural gene (i.e.. that the 
structural gene is under the transcriptional control of the promoter). 

The term "expression", as used herein, is intended to mean the production of 
the protein product encoded by a gene. More particularly, "expression" refers to 
25 the transcription and stable accumulation of the sense RNA (mRNA) derived from 
the nucleic acid fragment(s) of the invention that, in conjuction with the protein 
apparatus of the cell, results in altered levels of protein product. 
"Overexpression" refers to the production of a gene product in transgenic 
organisms that exceeds levels of production in normal or non-transformed 

30 organisms. "Altered levels" refers to the production of gene product(s) in 

transgenic organisms in amounts or proportions that differ from that of normal or 
non-transformed organisms. "Facilitating expression" refers to steps and 
conditions for culturing host cells containing the desirable gene to yield an 
increased production of the enzyme. For example, addition of a chemical inducer 

35 specific to the particular promoter operably linked to the gene facilitates 

expression of the encoded enzyme. This is measured relative to the production 
levels of an untreated gene. 
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The <k 3' non-coding sequences*' refers to the DNA sequence portion of a 
gene that contains a polyadcnylation signal and any other regulatory signal 
capable of affecting mRNA processing or gene expression. The polyadcnylation 
signal is usually characterized by affecting the addition of polyadenylic acid tracts 
to the 3' end of the mRNA precursor. 

The "translation leader sequence" refers to that DNA sequence portion of a 
gene between the promoter and coding sequence that is transcribed into RNA and 
is present in the fully processed mRNA upstream (5') of the translation start 
codon. The translation leader sequence may affect processing of the primary 
transcript to mRNA. mRNA stability, or translation efficiency. 

"Transformation" herein refers to the transfer of a foreign gene into the 
genome of a host organism and its genetically stable inheritance. Bacterial 
transformation can proceed by any of several methods well known in the art. 
including calcium chloride-mediated transformation and electroporation. 
Examples of methods of plant transformation include Agrobacierium-mediaied 
transformation and particle-accelerated or "gene gun" transformation technology 
(U.S. Patent No. 4.945,050). 

"Host cell" refers to the cell that is transformed with the introduced genetic 
material. 

'Tlasmid vector" refers to a double-stranded, closed circular, extra- 
chromosomal DNA molecule. 

'Tolerant" or "tolerance" refers to a condition whereby a cell or an organism 
is able to withstand the effect of application of a compound or composition at a 
concentration or application rate that causes a demonstrable effect in or against 
cells or organisms that are not tolerant. For example, the growth or survival of a 
plant that is tolerant to application of a herbicidal compound or composition will 
be less affected than the growth or survival of a plant that is not tolerant to 
application of the herbicidal compound or composition. 
Cloninc of Plant Genes Encoding p-Hvdroxvphenvlpvruvate Dioxvuenase 

The />hydroxyphenyipyruvate dioxygenases from plants are a promising 
new class of targets for new herbicidal compounds. In order to be able to study 
this enzyme in detail, and to have available supplies of enzyme for inhibitor 
screening, cDNA clones encoding plant p-hydroxyphenylpyruvate dioxygenases 
were identified. These nucleic acid fragments are useful for the production of 
their encoded enzymes, for isolation of clones from additional plant sources that 
encode other p-hydroxyphenylpyruvate dioxygenase enzymes, and for 
understanding the biochemical and structural properties of these enzymes. 
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Nucleic acid fragments comprising nucleotide sequences that encode 
different forms of the enzyme p-hydroxyphenylpyruvate dioxygenase from the 
plant Arabidopsis thaliana have now been isolated. Subsequently, these 
nucleotide sequences were expressed in E. coli cells and shown to direct the 
5 synthesis of plant p-hydroxyphenylpyruvatc dioxygenase enzymes. 

An automated search of nucleotide sequences contained in a database 
representing an Arabidopsis cDNA library for sequences homologous to other 
known, non-plant p-hydroxyphenylpyruvate dioxygenase genes revealed the 
plasmid cDNA clone 91B13T7. This cDNA was obtained from the Arabidopsis 

10 Seed Stock Center at Ohio State University. Plasmid DNA suitable for nucleotide 
sequence determination was prepared and the nucleotide sequence of the plasmid 
insert was determined. The resulting sequence was not interpretable. suggesting 
possible contamination of the plasmid sample by an extraneous nucleic acid. This 
assumption was confirmed by digesting the plasmid DNA sample with restriction 

1 5 enzymes and separating the resulting nucleic acid fragments by agarose gel 

electrophoresis. This analysis revealed the presence of nucleic acid fragments that 
could not be derived from the plasmid carrying the putative /?-hydroxyphenyl- 
pyruvate dioxygenase fragment. Furthermore, a search of the publically available 
nucleic acid sequence databases revealed that the Arabidopsis thaliana sequence 

20 reported for cDNA clone 91B13T7 corresponded to a truncated cDNA (Figure 1). 
Based on publically available mammalian cDNA sequence information for 
p-hydroxyphcnylpyruvate dioxygenase, the minimum length expected for a cDNA 
encoding a complete /?-hydroxyphenylpyruvate dioxygenase enzyme is I kb 
(Table 1). 

25 

Table 1 

Predicted cDNA Length for . juences 
Encoding ^-Hydroxyphenylpyruva-v Dioxygenase 





Amino Acid 




Organism 


Residues 


Minimum cDNA (kb) 


Human 


392 


1.176 


Pig 


392 . 


1.176 


Pseudomonas sp. 


357 


1.071 



30 

Therefore, based on the expected length of a cDNA capable of encoding a 
functional 77-hydroxyphenylpyruvate dioxygenase. the Arabidopsis thaliana 
sequence obtained from the public database was insufficient to encode a full- 
length, active /7-hydroxyphenylpyruvate dioxygenase enzyme. Therefore, a cDNA 
35 with the capacity to encode a full-length enzyme Arabidopsis thaliana was cloned, 
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as described herein. A 400 bp segment of the insert of plasmid 91B13T7 was 
liberated by digestion with restriction enzymes and used to screen a cDNA library 
prepared from norfiurazon- treated Arahidopsis thaliana seedlings (Scolnik. P. A., 
and Bartley. G. E. (1994) Plant Physiol. 104:1469-1470). Several clones showing 
positive hybridization to this probe were sequenced. The initial determination of 
the sequence of the longest cDNA clone obtained from this effort is shown in 
Figure 2 and in SEQ ID NO:2. During the course of subsequent work with this 
clone it became necessary to confirm certain features of the sequence. A corrected 
sequence of this cDNA is presented in SEQ ID NO: 12. 

The sequence reported in Figure 2 indicates that this cDNA has the capacity 
to encode a protein of MW 48.841 which, as shown in Figure 3. has a high level 
of homology to /?-hydroxyphenylpyruvate dioxygenase enzymes from other 
eukaryotes. 

A cDNA capable of encoding a full-length p-hydroxyphenylpyruvate 
dioxygenase has also been obtained from corn. This cDNA. contained in plasmid 
pMPDO, was identified in a corn cDNA library using an approximately 900 base 
pairs portion of the Arahidopsis cDNA as a probe. The predicted amino acid 
sequence that is encoded by the corn cDNA is also compared to /7-hydroxypheny- 
lpyruvate dioxygenase enzymes from other eukaryotes in Figure 3. 

A cDNA library was prepared from messenger RNA isolated from 
developing seeds of Vemonia galamenensis. Random sequencing of the clones 
contained in the library identified a probable clone, designated vsl .pkOOl 5.b2. for 
the />hydroxyphenylpyruvate dioxygenase from this plant. The 513 bp expressed 
sequence tag (EST) is presented in SEQ ID NO: 16. 

Expression of the Arahidopsis thaliana cDNA Encoding p-Hvdroxvphenvl- 
pyruvate Dioxygenase in E. coli 

The nucleic acid fragments of the instant invention encoding a plant 
/7-hydroxyphcnylpyruvatc dioxygenase enzymes can be operably linked to suitable 
regulatory sequences, thereby creating chimeric genes that can be used to direct 
expression of the enzyme in transgenic organisms. These transgenic organisms 
include, but are not limited to: plants (Plant Molecular Biology; Croy. R. R. D., 
Ed.; Bios Scientific Publishers; 1993); microorganisms, including Escherichia 
coli (Gold, L. (1990) Methods in Enzymology 185:1 1), Bacillus subtilis (Henner. 
D. J. (1990) Methods in Enzymology 185:199), yeast (Gellissen. G„ et al. (1992) 
Antonie Leeuwenhoek 62:79). and fungi, including members of the genus 
Aspergillus (Devchand, M. and Gwynne. D. I. (1991) J. BiotechnoL 17:3); and 
insect cells containing recombinant baculoviruses (Lukow. V. A. and Summers, 
M. D. (1988) Bio/Technology 6:47). 
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One skilled in the art can isolate the coding sequences from the fragments of 
the invention by using or creating sites for restriction endonucleases. as described 
in Sambrook, J., et al.((1989) Molecular Cloning. A Laboratory Manual 2nd ed.; 
Cold Spring Harbor Laboratory Press; hereinafter "Maniatis"). Alternatively, 
5 polymerase chain reaction (PCR) techniques can be employed to isolate and/or 
modify the fragments of the invention (Newton, C. R. and Graham, A. (1994) 
PCR\ Bios Scientific Publishers). 

Arabidopsis p-hydroxyphenylpyruvzue dioxygenase was expressed in E. coli 
under control of a T7 promoter in a strain expressing T7 RNA polymerase 

10 (Studier, F. W., et al. (1990) Methods in Enzymology 1 85:60). Promoters other 
than T7 are commonly used in expression vectors and could be substituted for 
protein expression in E. coli. Examples of alternative promoters include, but are 
not limited to. trp (Yansura. D. G. and Henncr, D. J. (1990) Methods in 
Enzymology 185:54), P L (Remaut. E. et al. (1981) Gene 15:81), tac (Amann, E. et 

15 at. (1983) Gene 25:167), trc (Amann. E. et al. (1988) Gene 69:301). and 

promoters such as lacUV'5, Ipp, P R , and hybrid and tandem promoters constructed 
to combine specific features to increase strength or regulation capacity (Balbas, P. 
and Bolivar, F. (1990) Methods in Enzymology 185:14). 
Biochemical Evidence of Enzymatic Function 

20 The enzyme /7-hydroxyphenylpyruvate dioxygenase catalyzes the reaction of 

/7-hydroxyphenylpyruvate with molecular oxygen to give homogentisate and C0 2 - 
The enzyme can be assayed by measuring oxygen utilization (Hager, S. E. ? et al. 

(1957) J. Biol. Chem, 225:935-947), C0 2 release or homogentisate production 
from radioactive labeled /7-hydroxyphenylpyruvate (Lindblad, B. (1971 ) Clin 

25 Chem. Acta 34:1 13-121), loss of the p-hydroxyphenylpyruvate (Lin. E. C. C. et al. 

(1958) 7. Biol. Chem. 233:668-673), or formation of homogentisate using a 
colbrimetric assay (Fellman, J. H. et al. (1972) Biochim Biophys. Acta 
284:90-100) or UV detection following HPLC or a similar chromatographic 
separation technique. The activity of p-hydroxyphenylpyruvate dioxygenase may 

30 also be measured in a coupled assay in which the initial product, homogentisate, is 
oxidized by homogentisate dioxygenase: formation of maleylacetoacetate 
determined by measuring absorbance at 330 nm (Fernandez-Canon. J. M. and 
Penalva, M. A. ( 1 997) Anal. Biochem. 245 :2 1 8-22 1 ). 

An alternative to any of the kinetic assays for p-hydroxypheny [pyruvate 

35 dioxygenase is an end-point or fixed-time assay. The procedure is based on the 
conversion of unconverted substrate, p-hydroxyphenylpyruvate to its enediol 
tautomer by tautomerase in the presence of borate ions and measurement of the 
characteristic 308 nm peak of the tautomer (Lin, E. C. C. et al. (1958) J. Biol. 
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Chem. 233:668-673). The procedure involves the addition of enough 
/?-hydroxyphenylpyruvate dioxygenase to consume -80% of the organic substrate 
over 1 hour in 200 \iL of assay buffer, which in this case is a 50 mM Tris, pH 7.4. 
0.10 mM /7-hydroxyphenyIpyruvic acid, 1.75 mM ascorbate and 1.25 mM EDTA. 
5 After 1 hr the reaction is quenched by the addition of 100 |iL of 0.8 M borate, 

pH 7.3. containing 1000 ppb of a p-hydroxyphenylpyruvate dioxygenase inhibitor 
and 0.25 fiL of 6.1 mg/mL of tautomerase. The absorbance at 308 nm is read after 
a 30 min incubation and is stable thereafter for 2 hr. The advantage of this assay 
over the kinetic procedure is that the p-hydroxyphenylpyruvate dioxygenase is not 

1 0 required to oxidize the substrate in the presence of high concentrations of borate, a 
condition that might interfere with the mode of action of inhibitors. Furthermore 
the assay produces essentially a stable binary indication of /?-hydroxypheny- 
lpyruvate dioxygenase inhibition, and is well-suited for applications which require 
a high-throughput of samples and assays. 

1 5 The enzyme encoded by the nucleic acid fragments and overexpressed in 

E. coli can be extracted in any conventional buffer used for extracting soluble 
plant enzymes. Although a large amount of an overexpressed protein is often 
insoluble, the amount that is soluble represents can represent as much as 50% of 
the total soluble protein. Soluble overexpressed protein has high /?-hydroxy- 

20 phenylpyruvate dioxygenase activity and is easily extracted. Likewise, it may be 
possible to resolubilize an insoluble overexpressed protein in an active form under 
appropriate conditions, since addition of sarkosyl (sodium N-lauroylsarcosinate) 
to the extraction buffer appeared to increase the amount of the overexpressed 
protein extracted. For optimum activity, a reducing agent such as ascorbate or 

25 reduced glutathione should be present as well as a source a ferrous ion. 

An overexpressed enzyme can be assayed using all the techniques 
described above for measuring /?-hydroxyphenylpyruvate dioxygenase activity, 
while only the techniques using labeled p-hydroxyphenylpyruvate can be used to 
measure activity in crude plant extracts. Therefore, the availability of an 

30 overexpressed enzyme greatly facilitates the development of high capacity screens 
to identify inhibitors of the enzyme. Potential inhibitors are evaluated for their 
capacity to reduce the rate of the reaction of the enzyme, resulting in reduced 
oxygen uptake and C0 2 release, and lower rates of formation of homogentisate 
and loss of p-hydroxyphenylpyruvate. Applicants have demonstrated that at least 

35 one of the instant nucleic acid fragments can be overexpressed in E. coli cells, 
resulting in production of a protein that catalyzes the conversion of /?-hydroxy- 
phenylpyruvate to homogentisate with the release of C0 2 - Furthermore, it has 
been shown that this activity is inhibited by commercial herbicides known to 
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inhibit />hydroxyphenyipyruvate dioxygenasc. Finally, an overexpressed enzyme 
can be used in a high capacity assay to identify compounds that inhibit the 
enzymatic activity of />-hydroxyphenylpyruvate dioxygenase. Such compounds 
may serve as herbicides. 
5 Preparation of Plants Tolerant to Inhibitors of ;>Hvdroxvphenvlnvruvate 
Dioxvgenase 

This invention embodies plants which are resistant or at least tolerant to 
herbicides that target the /?-hydroxyphenylpyruvate dioxygenase enzyme at levels 
which are normally inhibitory to the naturally occurring /?-hydroxyphenylpyruvate 

1 0 dioxygenase enzyme. This altered />hydroxyphcnylpyruvate dioxygenase activity 
is conferred by (1 ) overexpression of the wild-type /?-hydroxyphenylpyruvate 
dioxygenase enzyme, or (2) expression of a DNA molecule encoding a herbicide- 
tolerant enzyme. The said enzyme may be a modified form of an /^-hydroxy- 
phenylpyruvate dioxygenase enzyme that occurs naturally in a eukaryote or 

15 prokaryote. or a modified form of an />hydroxyphenylpyruvatc dioxygenase 
enzyme that naturally occurs in a plant, or a herbicide tolerant enzyme that 
naturally occurs in a prokaryote (Duke et al. Herbicide Resistant Crops: Lewis: 
Boca Raton: 1994). An effective amount of gene expression to render the cells of 
the plant tissue substantially tolerant to the herbicide depends on whether the gene 

20 codes for an unaltered /?-hydroxyphenylpyruvate dioxygenase gene or a mutant or 
altered form of the gene that is less sensitive to the herbicides. Expression of an 
unaltered plant /?-hydroxyphenylpyruvate dioxygenase gene in an effective 
amount is that amount that provides for a 2- to 10-fold increase in herbicide 
tolerance. Plants encompassed by the invention include monocotyledoneous and 

25 dicotyledoneous plants. Preferred are those plants which would be potential 
targets for p-hydroxyphenylpyruvatc dioxygenase-inhibiting herbicides, 
particularly agronomically important crops such as maize and other cereal crops. 

Increased levels of expression of /?-hydroxyphenylpyruvate dioxygenase 
activity, from two to ten or more times the natively expressed amount, would be 

30 sufficient to overcome growth inhibition caused by the herbicide. Plants 

containing such altered /;-hydroxyphenyIpyruvate dioxygenase enzyme activity 
can be obtained by direct selection in plants. This method is known in the an. 
See, e.g., U.S. Patent No. 5,162,602, U.S. Patent No. 4,761,373, and references 
cited therein. 

35 Overexpression of /^-hydroxy phenylpyruvate dioxygenase also can be 

accomplished by stably transforming a host plant cell with a chimeric DNA 
molecule comprising a promoter capable of driving expression of an associated 
coding sequence in a plant cell and operably linked to a homologous or 
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heterologous coding sequence encoding p-hydroxyphenylpyruvate dioxygenase. 
A "homologous'" p-hydroxyphenylpyruvate dioxygenase gene is isolated from an 
organism taxonomically identical to the target plant cell, whereas a "heterologous"" 
/>hydroxyphenylpyruvate dioxygenase gene is obtained from an organism 
5 taxonomically distinct from the target plant. 

The expression of foreign genes in plants is well-established (De Blaere et 
aL (1987) Meth. EnzymoL 143:277-291). Promoters utilized to drive gene 
expression in transgenic plants or plant cells (i.e.. those capable of driving 
expression of the associated coding sequences such as /?-hydroxyphcnylpyruvate 

10 dioxygenase in plant cells, include those directing the 19S and 35S transcripts in 
Cauliflower mosaic virus (Odell et aL, (1985) Nature 313:810-812; Hull et aL. 
(1987) Virology 86:482-493), small subunit of ribulosc 1.5-bisphosphate 
carboxylase (Morelli et aL (1985) Nature 3 15:200-204: Broglie et aL (1984) 
Science 224:838-843: Hercrra-Estrella et aL. (1984) Nature 310:1 15-120: Coruzzi 

15 et aL (1984) EMBOJ. 3:1671-1679: Faciotti et aL. (1985) Bio/Technology 3:241 
and chlorophyll a/b binding protein (Lamppa et aL, (1986) Nature 316:750-752): 
nopaline synthase promoters (Dcpickcr et aL (1982)./ Mol App. Genet. 
7:561-573; An et aL (1990) Plant Cell 2:225-233). The chimeric DNA 
construct(s) of the invention may contain multiple copies of a promoter or 

20 multiple copies of the /?-hydroxyphenylpyruvate dioxygenase coding sequences. 
In addition, the construct(s) may include coding sequences for selectable markers 
and coding sequences for other peptides such as signal or transit peptides. The 
preparation of such constructs is within the ordinary level of skill in the art. 
Resistance to inhibitors of the plant carotenoid biosynthesis pathway, which is 

25 also targeted by /?-hydroxyphenylpyruvate dioxygenase inhibitors, has been 

achieved by expressing a bacterial gene encoding phytoene desaturase driven by 
the CaMV promoter (Misawa et aL, (1994) Plant. ,/ V:48 1-490). 

Transit peptides may be fused to the ^-hydroxyphenylpyruvate dioxygenase 
coding sequence in the chimeric DNA constructs of the invention to direct 

30 transport of the expressed /?-hydroxyphenylpyruvate dioxygenase enzyme to the 
desired site of action. Examples of transit peptides include the chloroplast transit 
peptides such as those described in Von Heijne et aL, (1991) Plant Mol Biol Rep. 
9:104-126; Mazur et aL, (1987) Plant Physiol 85:1 1 1 0; Vorst et aL. (1988) Gene 
65:59; and mitochondrial transit peptides such as those described in Boutry et al. ? 

35 (1987) Nature 328:340-342. 

It is envisioned that the introduction of enhancers or enhancer-like elements 
into other promoter constructs will also provide increased levels of primary 
transcription to accomplish the invention. These would include viral enhancers 
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such as that found in the 35S promoter (Odell et al., (1988) Plant Mol Biol. 
10:263-272), enhancers from the opine genes (Fromm et al., (1989) Plant Cell 
1 :977-984), or enhancers from any other source that result in increased 
transcription when placed into a promoter operably linked to the nucleic acid 
5 fragment of the invention. 

Introns isolated from the maize Adh-1 and Bz-1 genes (Callis et al., (1987) 
Genes Dev. 1:1 183-1200), and intron 1 and exon 1 of the maize Shrunken- 1 (sh-1) 
gene (Maas et aL (1991) Plant Mol. BioL 16:199-207) may also be of use to 
increase expression of introduced genes. Results with the first intron of the maize 

10 alcohol dehydrogenase (Adh-1) gene indicate that when this DNA element is 

placed within the transcriptional unit of a heterologous gene, mRNA levels can be 
increased by 6.7-fold over normal levels. Similar levels of intron enhancement 
have been observed using intron 3 of a maize actin gene (Luehrsen. K. R. and 
Walbot, V.. (1991) Mol. Gen. Genet. 225:81-93). Enhancement of gene 

15 expression by Adhl intron 6 (Oard et al.. (1989) Plant Cell Rep 8:156-160) has 
also been noted. Exon 1 and intron 1 of the maize sh-1 gene have been shown to 
individually increase expression of reporter genes in maize suspension cultures by 
10 and 100-fold, respectively. When used in combination, these elements have 
been shown to produce up to 1000-fold stimulation of reporter gene expression 

20 (Maasetal., (\99\) Plant MoL BioL 16:199-207). 

Any 3' non-coding region capable of providing a polyadenylation signal and 
other regulatory sequences that may be required for proper expression can be used 
to accomplish the invention. This would include the 3 r end from any storage 
protein such as the 3' end of the lOkd, 1 5kd. 27kd and alpha zcin genes, the 3' end 

25 of the bean phaseolin gene, the 3' end of the soybean ^-conglycinin gene, the 3' 
end from viral genes such as the 3' end of the 35S or the 19S cauliflower mosaic 
virus transcripts, the 3' end from the opine synthesis genes, the 3' ends of ribulose 
1 ,5-bisphosphate carboxylase or chlorophyll a/b binding protein, or 3' end 
sequences from any source such that the sequence employed provides the 

30 necessary regulatory information within its nucleic acid sequence to result in the 
proper expression of the promoter/coding region combination to which it is 
operably linked. There are numerous examples in the art that teach the usefulness 
of different 3' non-coding regions (for example, see Ingelbrecht et al., (1989) 
Plant Cell 1:671-680). 

35 Various methods of introducing a DNA sequence (i.e., of transforming) into 

eukaryotic cells of higher plants are available to those skilled in the art (see EPO 
publications 0 295 959 A2 and 0 138 341 Al ), Such methods include high- 
velocity ballistic bombardment with metal particles coated with the nucleic acid 
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constructs (see Klein et al., (1987) Nature (London) 327:70-73. and see U.S. 
Patent No. 4,945,050). as well as those based on transformation vectors based on 
the Ti and Ri plasmids of Agrobacterium spp., particularly the binary type of these 
vectors. Ti-derived vectors transform a wide variety of higher plants, including 
monocotyledonous and dicotyledonous plants, such as soybean, cotton and rape 
seed (Pacciotti et al.. (1985) Bio/Technology 3:241 ; Byrne et al.. (1987) Plant 
Cell, Tissue and Organ Culture 8:3; Sukhapinda et al., (1987) Plant Mol Biol 
8:209-216; Lorz et al.. (1985) Mol Gen. Genet. 199:178-182; Potrykus et al.. 
(1985) Mol Gen. Genet. 199:183-188). 

Other transformation methods are available to those skilled in the art, such 
as direct uptake of foreign DNA constructs (see EPO publication 0 295 959 A2), 
and techniques of electroporation (see Fromm et al., (1986) Nature (London) 
319:791-793). Once transformed, the cells can be regenerated by those skilled in 
the art. Also relevant are several recently described methods of introducing 
nucleic acid fragments into commercially important crops, such as rapeseed (see 
De Block et al.. (1989) Plant Physiol 91 :694-701 ), sunflower (Everett et al.. 
(1987) Bio/Technology 5:1201-1204), soybean (McCabe et al.. (1988) 
Bio/Technology 6:923-926; Hinchee et al.. (1988) Bio/Technology 6:915-922; 
Chee et al., (1989) Plant Physiol 91:1212-1218; Christou et al.. (1989) Proc. 
Natl Acad Sci USA 86:7500-7504; EPO Publication 0 301 749 A2), and corn 
(Gordon-Kamm et al., (1990) Plant Cell 2:603-618; and Fromm et al., (1990) 
Bio/Technology 8:833-839). 

Altered /3-hydroxyphenylpyruvate dioxygenase enzyme activity may also be 
achieved through the generation or identification of modified forms of the isolated 
eukaryotic p-hydroxyphenylpyruvate dioxygenase coding sequence having at least 
one amino acid substitution, addition or deletion which encodes an altered 
/7-hydroxyphenylpyruvate dioxygenase enzyme resistant to a herbicide that 
inhibits the unaltered, naturally occurring form. Genes encoding such enzymes 
can be obtained by numerous strategies known in the art. A First general strategy 
involves direct or indirect mutagenesis procedures on microbes (e.g., £. coli, 
S. cerevisiae (Miller, (1972) Experiments in Molecular Genetics, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, NY; Davis et al., (1980) Advanced 
Bacterial Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY; 
Sherman et al., (1983) Methods in Yeast Genetics, Cold Spring Harbor 
Laboratory, Gold Spring Harbor NY; and U.S. Patent No. 4,975,374) and 
cyanobacteria (Bryant, The Molecular Biology oj Cyanobacteria: Kluwer 
Academic Publishers: Boston, 1995). A second method of obtaining mutant 
herbicide-resistant alleles of the eukaryotic p-hydroxyphenylpyruvate dioxygenase 
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enzyme involves direct selection in plants. For example, the effect of inhibitors 
on the growth of plants such as Arabidopsis, soybean, or maize may be 
determined by plating seeds sterilized by art-rccognizcd methods on plates on a 
simple minimal salts medium containing increasing concentrations of the 
5 inhibitor. The lowest dose at which significant growth inhibition can be 

reproducibly detected is used for subsequent experiments. Mutagenesis of plant 
material may be utilized to increase the frequency at which resistant alleles occur 
in the selected population. Mutagenized seed material can be derived from a 
variety of sources, including chemical or physical mutagenesis or seeds, or 
1 0 chemical or physical mutagenesis or pollen (Ncuffer, In Maize for Biological 

Research. Sheridan, ed. Univ. Press, Grand Forks, ND., pp. 61-64 (1982)), which 
is then used to fertilize plants and the resulting Ml mutant seeds collected. 
Typically, for Arabidopsis. M2 seeds (i.e., progeny seeds of plants grown from 
seeds mutagenized with chemicals, such as ethyl methane sulfonate, or with 
1 5 physical agents, such as gamma rays or fast neutrons) arc plated at densities of up 
to 10.000 seeds/plate (10 cm diameter) on minimal salts medium containing an 
appropriate concentration of inhibitor. Seedlings that continue to grow and 
remain green 7-21 days after plating are transplanted to soil and grown to maturity 
and seed set. Progeny of these seeds are tested for resistance to the herbicide. If 
20 the resistance trait is dominant, plants whose seed segregate 3:1 

(resistant:sensitive) are presumed to have been heterozygous for the resistance at 
the M2 generation. Plants that give rise to all resistant seed are presumed to have 
been homozygous for the resistance at the M2 generation. Such mutagenesis on 
intact seeds and screening of their M2 progeny seed can also be carried out on 
25 other species, for instance soybean (see. e.g.. U.S. Patent No. 5.084.082). Mutant 
seeds to be screened for herbicide tolerance can also be obtained as a result of 
fertilization with pollen mutagenized by chemical or physical means, 

EXAMPLE 1 
Cloning of a cDNA for Arabidonsis thaliana 
30 ^-Hvdroxvphenvlpyruvatc Dioxygenase 

The plasmid containing the Arahidopsis thaliana 91B13T7 expressed 
sequence tag (Newman et aL ( 1 994) Plant Physiol 1 06: 1 24 1 - 1 255) was digested 
with the restriction enzymes BamHl and EcoRL and the resulting 400 bp fragment 
was used to screen a lambda phage cDNA library of Arabidopsis thaliana 
35 seedlings (Scolnik, P. A. and Bartley, G. E. (1994) Plant Physiol. 104:1469-1470) 
according to the following protocol. 

E. coli KW251 cells were grown overnight in Luria Broth ("LB") containing 
0.2% maltose and 10 mM MgS0 4 . Cells were pelleted by centrifugation and 
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resuspended in 10 mM MgS0 4 to an OD 600 of 0.5. Cell aliquots (0.8 mL) were 
mixed with 0.1 mL of diluted phage samples and 7 mL of top agarose (0.7% 
agarose in LB containing 10 mM MgS0 4 ) at 45°C, and plated onto 150 mm Petri 
dishes containing LB agar. Phage plaques became visible in 5-7 h, at which point 
5 the plates were placed at 4°C. 

Phage plaques were transferred to nitrocellulose filters according to standard 
techniques, and the filters were hybrized to 32p_ ra dioIabeled probe prepared 
according to the method of Feinberg and Vogelstem ((1983) Anal. Biochem. 
132:6-13). using the hybridization conditions of Berlyn et ai.((1989) Proc. Nad. 
\ 0 Acad Sci. 86:4604-4608). After exposure to X-ray film for 48 h, 12 positive 

plaques were eluted, plated, and hybridized under the same conditions. A total of 
9 plaques that retained positive signals in this second round of hybridization were 
subjected to in vivo excision using the Exassist/SOLR™ system according to the 
manufacturer's protocol (Stratagene Cloning Systems, La Jolla. CA). DNA from 
I 5 the plasmids resulting from in vivo excision of positive plaques was prepared for 
DNA sequencing using the Wizard Plus™ kit (Promega. Madison, WI). Eight of 
the clones that were sequenced showed strong conservation with available 
p-hydroxyphenylpyruvate dioxygenase sequences, whereas the remaining clone 
did not correspond to a /?-hydroxyphenylpyruvate dioxygenase. Alignment with 
20 known p-hydroxyphenylpyruvate dioxygenase sequences also revealed that two of 
the clones correspond to 0.3 kbp fragments from the 3' end of the transcript, and 
another two to 1 .2 kbp fragments from the 5' end of the transcript. One clone of 
each was used to assemble a 1 .5 kbp cDNA by ligating at the internal Nhel 
restriction site (Figure I). The initial determination of the DNA sequence (SEQ 
25 ID NO:2) of the resulting cDNA clone is shown in Figure 2. Subsequent work 
with this DNA fragment required confirmation of some of the features of its 
sequence. Approximately ten nucleotide residues were found to have been listed 
in error. Thus a corrected sequence for this DNA fragment is listed in SEQ ID 
NO: 14 and the deduced amino acid sequence is set forth in SEQ ID NO: 15. The 
30 revised sequences form the bases for analyses and comparisons reported herein. 

EXAMPLE 2 
Overexpression of the Arabidonsi s cDNA in E. coli 
The deduced amino acid sequence for Arabidopsis p-hydvoxy^h^nyV 
pyruvate dioxygenase was aligned with the amino acid sequences of 
35 /?-hydroxyphenylpyruvate dioxygenase from mouse, pig, and Streptomyces 

avermitilis using the Pileup program of GCG (Program Manual for the Wisconsin 
Package, Version 8, September 1994, Genetics Computer Group, 575 Science 
Drive, Madison, WI, USA 5371 1). This analysis suggested an additional 
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29 amino acid-extension at the amino terminus of the Arabidopsis sequence 
(positions 1-29, Figure 3 and SEQ ID NO:3). This amino-terminal extension was 
assumed to be a chloroplast transit peptide which would be absent from the 
mature enzyme. Therefore, removal of the chloroplast transit peptide coding 
5 sequence coincided with transfer of the /?-hydroxyphenylpyruvate dioxygenase 
coding sequence from the cloning vector into the expression vector. 

The ArabidopsLs p-hydroxyphznylpyruvate dioxygenase cDNA was moved 
from the pBluescript SK- cloning vector (Stratagenc, La JoIIa, CA) to the 
pET24c(+) expression vector (Novagen, Madison, WI) through the intermediate 
10 cloning vector pT7BlueR (Novagen). The plasmid pGBPPD2 consists of the 
^r«6/c/o/?m/?-hydroxyphenylpyruvate dioxygenase cDNA and the pBluescript 
SK- cloning vector (Stratagene). The plasmid pE24CPl consists of the 
ArabidopsLs p-hydroxyphenylpyruvaxe dioxygenase cDNA, without the putative 
chloroplast transit peptide DNA sequence, and the pET24c(+) expression vector 
1 5 (Novagen). 

The plasmids pGBPPD2 and p T7BIueR (5 |ag each) were individually 
digested with 20 units of Xba 1 ( New England Biolabs, NEB. Beverly, MA) and 
20 units of Hind III (Gibco BRL. Gaithersburg, MD) in NEB restriction enzyme 
buffer 2 supplemented with 100 jug/mL bovine serum albumin at 37 °C for 1.75 h. 

20 Digesting pGBPPD2 with the restriction enzymes Xba I and Hind III releases the 
5" and 3' ends, respectively, of the p-hydroxypheny [pyruvate dioxygenase cDNA 
from the pBluescript SK- poly linker. Products of the digestion were electro- 
phoretically separated in a 1 percent agarose gel using TRIS/acetate/EDTA (TAE) 
buffer and visualized with ethidium bromide staining (Maniatis). Digestion of 

25 pGBPPD2 with the two restriction endonucleases resulted in a 2922 bp vector 
band and 1499 bp p-hydroxyphenylpyruvate dioxygenase cDNA band. Only a 
2863 bp band was apparent after digesting pT7BlueR with the two enzymes, 
although a 24 bp fragment would also result. The 1499 bp /?-hydroxypheny- 
lpyruvate dioxygenase band and the 2863 bp T7BlueR band were cut out of the 

30 gel and the associated DNA purified from the agarose using a QIAquick Gel 
Extraction Kit (Qiagen. Chatsworth, CA) according to the manufacturer's 
instructions. The purified DNA samples were precipitated by the addition of 
sodium acetate (pH 5.2) to 0.3 M, 10 pg tRNA (added as carrier), two volumes of 
-20 °C ethanol and incubation at -20 °C overnight. Nucleic acid pellets were 

35 collected by centrifiigation, washed with 70% ethanol and air dried. Both pellets 
were solublized in 10 ^iL of TRJS/EDTA (TE) buffer, pH 8 (Maniatis), and then 
1 nL of each sample loaded onto a 1% agarose, TAE gel in separate wells next to 
a well containing 4 \xL of Mass Ladder (Gibco BRL). All samples were adjusted 
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to 10 nL with water before loading. DNA was quantified by comparing band 
intensities of each sample with Mass Ladder band intensities following ethidium 
bromide staining and UV illumination. 

Approximately 300 ng of p-hydroxyphenylpyruvate dioxygenase insert was 
5 mixed with 300 ng of double digested pT7BlueR vector in a total volume of 7 
and then heated to 45 °C for 5 min followed by cooling on ice. T4 DNA ligase 
buffer (Gibco BRL) and 1 unit of T4 DNA ligase (Gibco BRL) were added to the 
cooled DNA for a total volume of 10 |iL. The ligation mix was incubated at room 
temperature for 4 h and then transformed into MAX Efficiency DH5a Competent 
10 Cells (Gibco BRL) of E. coli according to standard procedures (Maniatis). 
Transformed bacteria were spread onto LB agar plates supplemented with 
1 00 fig/mL carbenicillin and incubated overnight at 37 °C. Seventeen bacterial 
colonies were selected for subsequent analysis. A portion of each colony was 
inoculated into a separate 17x100 mm polypropylene culture tube (Falcon, 
1 5 Lincoln Park. NJ) containing 2 mL of liquid LB media and 200 ng/mL 

carbenicillin. Liquid bacteria cultures were incubated overnight at 37 °C with 
shaking (250 rprn). Plasmid DNA was then isolated using a QI Aprep Spin 
Plasmid Miniprep Kit (Qiagen) according to the manufacturer's instructions. A 
portion (5 ^L out of 50 \xL total) of each plasmid preparation was digested with 
20 10 units each of Hind III and EcoR V (Gibco BRL) in a total volume of 15 
with React 2 buffer (Gibco BRL) for one h. (Note: The EcoRV site in the 
pBluescript polylinker was destroyed during the preparation of pGBPPD2 so only 
the EcoRV site in the pT7BlueR polylinker would be accessible to the restriction 
nuclease). Samples were separated electrophoretically in 1% agarose and 
25 tris/borate/EDTA (TBE) buffer (Maniatis). Bands were visualized with ethidium 
bromide staining; 7 out of 17 samples which contained 2 bands (2837 and 
1525 bp) contained the p-hydroxyphenylpvruvate dioxygenase insert and were 
designated pT7BlueR+PD01 (see Figure 4). 

In order to remove the putative chloroplast transit sequence, the remaining 
30 45 yiL of each prep of pT7BlueR+PD01 were combined into a single sample and 
the DNA content determined spectrophotometrically at A26O (Maniatis). A 
portion (5 jig) of pT7BlueR+PDO! was digested with 16 units of Eco47 III (MBI 
Fermentas) in a total volume of 100 \xL containing buffer 0 (MBI Fermentas) at 
37 °C for 2 h. The digested plasmid DNA was then precipitated with sodium 
35 acetate and ethanol as above and the resulting dried nucleic acid pellet was 

dissolved in 60 nL of React 2 (Gibco BRL) containing 20 units of Nde I (Gibco 
BRL) and incubated 2 h at 37 °C. The double digested sample was then loaded 
onto a 1% agarose gel in TAE and the large 4166 bp Nde I-Eco47III fragment 
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separated from the 196 bp fragment electrophoretically. The large fragment was 
cut out of the gel. purified from agarose and precipitated as above. 

An oligonucleotide mix was prepared consisting of 100 pmoles each of 
oligos CAM32 and CAM33 (SEQ ID NOS:4 and 5, respectively) in a combined 
5 volume of 9.9 uL. The two oligos complement each other to form a 3' blunt end 
corresponding to the 5' half of an Eco47 III restriction site and also form a 5' 
staggered end which corresponds to the 3' half of an Nde I restriction site. 



10 



CAM32:(SEQIDNO:4) 

5-TATGTCCAAGTTCGTAAGAAAGAATCCAAAGTCTGATAAATTCAAGGTTAAGC-3' 
CAM 33: (SEQ ID NO:5) 

5'-GCTTAACCTTGAATTTATCAGACTTTGGATTCTTTCTTACGAACTTGGACA-3' 

1 5 The oligo mix was heated to 90 °C for 1 .5 min and then allowed to cool to 

room temperature over 20 min. The dried nucleic acid pellet resulting from 
purification of the 4166 bp Nde l-Eco47 III fragment was solublized in 7 uL of 
the cooled oligo mix and subsequently heated to 45 °C for 5 min followed by 
cooling on ice. Ligation of the oligos with the Nde l-Eco47 III fragment followed 
20 by transformation into DH5a was performed as above. Transformed bacterial 
cells were spread onto LB/carbenicillin plates and incubated at 37 °C overniaht. 
Seventeen colonies were selected and processed to isolate plasmid DNA as above. 
A portion (5 out of 50 uL) of each plasmid was double digested with 10 units each 
of Nde I and Hind III and the fragments separated electrophoretically on a 1% 
25 agarose gel in TBE. A two band pattern corresponding to insert (1373 or 1518 bp) 
and vector (2844 bp) was detected. An additional double digest with 10 units 
each of Xba I and Hind III was performed on another 5 uL aliquot of plasmids. 
When digested with Nde I and Hind III. none of the plasmids which contained the 
smaller insert size contained a Xba I site. The Xba I site would be eliminated if 
30 the two oligos replaced the 196 bp fragment originally present in pT7Blue+PD01 . 
The 7 plasmid samples with the modified />hydroxyphenylpyruvate dioxygenase 
insert were combined and designated pT7BlueR+PD02. 

The pT7BlueR+PD02 plasmid DNA was quantified spectrophotometrically 
(above) and then 5 ug was digested with 20 units each of Hind III and Nde I in 
62 uL of React 2 for 2 h at 37 °C. The digested sample was subsequently loaded 
onto a 1% agarose gel in TAE and separated electrophoretically. The 1373 bp 
fragment was isolated and precipitated as above. The plasmid pET24c(+) (5 uu) 
was double digested with 20 units each of both Nde I and Hind III in React 2 at 
37 °C for 2 h and the 5245 bp fragment then gel purified on a 1% agarose gel in 
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TAE and subsequently separated from agarose and precipitated as above. The 
dried pET24c(+) pellet was solublized in 10 \iL TE and then 8 |iL was adjusted to 
a 20 nL total volume with water, dephosphoryiation buffer (Gibco BRL) and 
1 unit of calf intestinal alkaline phosphatase (Gibco BRL). The sample was 
5 incubated at 37 °C for 30 min and then gel purified, separated from agarose, and 
precipitated as above. The dried, dephosphorylated, pET24c(+) vector pellet and 
modified />hydroxyphenylpyruvate dioxygenasc insert pellet were each solublized 
in 10 |aL TE and then 1 |.iL of each was run on a 1% agarose TBE gel with 4 \xL of 
mass ladder to quantify DNA as above. One hundred nanograms of modified 

10 ;?-hydroxyphenylpyruvate dioxygenase insert was mixed with 120 ng of 

dephosphorylated pET24c(+) vector in a total of 7 \xL volume. The mix was 
heated to 45 °C for 5 min and then cooled on ice. The mix was then supplemented 
with T4 DNA ligase buffer and 1 unit of T4 DNA ligase in a total volume of 
10 (iL and the mix allowed to incubate at room temperature for 4 h. The ligation 

1 5 mix was subsequently transformed into DH5a. spread on LB agar supplemented 
with 30 f.ig/mL kanamycin. and incubated overnight at 37 °C. Plasmid 
preparations were performed on 1 1 colonies as above. Plasmids were double 
digested with Nde 1 and Hind III and fragments separated electrophoretically. All 
plasmids had the expected 1373 bp and 5245 bp fragments. One bacteria colony 

20 was selected and used to inoculate 100 mL of liquid LB supplemented with 

30 ^ig/mL kanamycin which was subsequently incubated at 37 °C overnight with 
shaking. Plasmid DNA was isolated from the resulting bacteria culture using a 
Qiagen Plasmid Midi Kit according to the manufacturers instructions. A portion 
of the plasmid DNA (pE24CPl) was sequenced with the Sequenase Version 2.0 

25 DNA Sequencing Kit (United States Biochemical. Cleveland, OH) using a 

biotinylated sequencing primer to the T7 promoter (United State Biochemical^ 
according to the manufacturer's instructions for non-radioactive manual 
sequencing. DNA was transferred from the sequencing gel to Hybond-N+ nylon 
transfer membrane (Amersham. Arlington Heights, IL) by capillary action. 

30 Transfer and all subsequent steps in chemiluminescent detection of DNA 

fragments were performed with a SEQ-Light Chemiluminescent Sequencing 
System kit (Tropix, Bedford, MA) according to the manufacturer* s instructions. 
DNA sequencing verified that the plasmid contained the expected 5' sequence for 
the modified p-hydroxyphenylpyruvate dioxygenase insert where nucleotides 1-95 

35 (Figure 2) were replaced with an ATG transcriptional start site. This is equivalent 
to amino acids 2-29 (Figure 3) being eliminated from the N-terminus of the 
Arabidopsis />hydroxyphenylpyruvate dioxygenase amino acid sequence. 
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The plasmid pE24CPl was transformed into competent cells of BL2UDE3) 
E. coli (Novagen), as above. Transformed cells were spread on LB/kanamycin 
plates and incubated overnight at 37 °C. Seven colonies were selected for plasmid 
preparations as above and plasmid DNA was double digested with Nde 1 and 
5 Hind III to verify that all plasmids had the expected electrophoretic banding 
pattern. One colony was selected and streaked for isolation on LB/kanamycin 
plates. A well isolated colony was used to inoculate liquid LB supplemented with 
30 ug/mL kanamycin and the culture was incubated at 37 °C with shaking 
(250 rpm) until it reached an A 600 of 0.6 absorbancc units. An 8% glycerol 
1 0 freezer stock was prepared according to the Novagen protocol and stored at 

-80 °C. All subsequent expression studies were done with freshly grown bacterial 
cells that were isolated from LB/kanamycin plates streaked from the glycerol 
freezer stock. 

BL2HDE3) E. coli cells containing either pE24CPl or pET24c(+) (negative 
1 5 control) were streaked out onto LB/kanamycin plates from a glycerol freezer stock 
(above) and incubated overnight at 37 °C. One isolated colony was selected for 
inoculation of 2 mL of LB containing 30 ug/mL kanamycin in a 17 x 100 mm 
Falcon tube, and the culture was incubated at 37 °C with shaking (250 rpm) 
overnight. The overnight cultures were then used to inoculate 1 00 mL of fresh LB 

20 containing 30 ug/mL kanamycin. The new cultures were incubated at 37 °C with 
shaking until the A 600 reached between 0.4 and 0.6 absorbance units. One half of 
the pE24CPl and pET24c(+) cultures were placed in new culture flasks and IPTG 
(isopropylthio-p-D-galactoside: Gibco BRL) was added to the new flasks to give a 
final concentration of I mM. The flasks were incubated an additional 3 h at 37 °C 

25 with shaking, and then the cells were harvested. 

The harvested cells were centrifuged and the resulting cell pellet extracted 
by sonication (3 x 10 sec bursts) in 2 mL extraction buffer (50 mM (20 mM in the 
first experiment: Table 2) potassium phosphate buffer. pH 7.2, containing 0.14 M 
KC1, 0.32 mM reduced glutathione. 1% polyvinylpolypyrrolidone. and 0.1% 

30 Triton X 100 (0.01% lysozyme was included in the first experiment only)). The 
lysate represents the crude extracted enzyme after centrifugation at 17000 g for 
10 min. In the first experiment (Table 2) a 20 to 60% ammonium sulfate 
precipitated enzyme fraction was also assayed. Solid ammonium sulfate was 
slowly added with stirring to 2 mL of the lysate to bring the concentration to 20% 

35 (w/v). After incubation on ice for approximately 15 min, the solution was 

centrifuged at 17000 g for 10 min. The supernatant liquid was harvested and solid 
ammonium sulfate was added to increase the concentration to 60% (w/v). After 
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centrifugation, the resulting pellet was resuspcnded in I mL of the extraction 
buffer. 

A portion of the insoluble protein resulting from expression of Arabidopsis 
/7-hydroxyphenylpyruvate dioxygenase in bacteria was utilized for N-terminal 
5 sequence analysis. The protein (approximately 1 80 ^g) was suspended in 60 jaL 
of extraction buffer and then diluted with 5 volumes of sample buffer (62.5 mM 
Tris, pH 6.8, 6 M urea, 160 mM dithiothreitol, 0.01% bromophenol blue) 
followed by intermittent vortexing for one hour at room temperature. A 1.5 mm 
thick, 12% polyacrylamide resolving gel was prepared for a Mini-Protein II dual 

10 slab cell (Bio-Rad, Hercules, CA) using the manufacturer's instructions. The 
polyacrylamide was allowed to polymerize for 3 h and then a stacking gel was 
prepared using a preparative comb. The running buffer was prepared according to 
the manufacturer's instructions with the addition of 0. 1 mM sodium thioglycolatc. 
The solublized protein sample was electrophoretically separated using the 

1 5 manufacturer's instructions. When the bromophenol blue dye front reached the 
bottom of the gel. the gel was removed and equilibrated for 5 min in blotting 
buffer (10 mM CAPS, pH 1 1, 10% methanol, balance water). The gel was then 
placed in a Mini Trans-Blot Electrophoretic Transfer Cell (Bio-Rad), according to 
the manufacturer's instructions, with a ProBlott PVDF membrane (Applied 

20 Biosystems, Foster City, CA) treated according to the manufacturer's instruction. 
Electroblotting was done in the presence of blotting buffer at 50 volts for 45 min 
in an ice bath. The membrane was then rinsed in water and stained with 
Coomassie Blue as described in the ProBlott protocol. The major protein band 
was excised from the membrane and subjected to N-terminal amino acid 

25 sequencing on a Beckman (Fullerton, CA> LF3000 protein sequencer. The first 
1 1 cycles identified S-K-F-V-R-K-N-P-K-S-D (see SEQ ID NO:3. amino acids 
30-40), respectively. This is the expected N-terminus of the modified Arabidopsis 
/?-hydroxyphenylpyruvate dioxygenase minus the initial methionine (amino acids 
30-40, Figure 3). 

30 EXAMPLE 3 

p-Hvdroxvphenvlpyruvate Dioxvgenase Enzvmatic Activity 
of the Plant Protein Expressed in E. Coli 
Cell cultures with different plasmid constructs were extracted as described 
above and assayed by measuring the formation of 14 CC>2 from 

35 [l- 14 C]-/?-hydroxyphenylpyruvate or ,4 C0 2 and ,4 C-homogentisate from 
[XjJ4c]-p-hydroxyphenylpyruvate (Lindblad. B., (1971) Clin. Chim. Acta 
34:1 13-121; and Lindstedt, S. and Odelhog, B., (1987) Methods in Er\zymolog\> 
142:143-148). The labeled substrate was prepared from [l- l4 C]-L-tyrosine 
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(55 mCi/mmoi; American Radiolabeled Chemicals. Inc.. St. Louis. MO) or 
[U- l4 C]-L-tyrosine (498 mCi/mmol; DuPont NEN. Boston. MA). A 50-100 |iL 
aliquot (5-10 nCi) of the of the labeled tyrosine stock solution was transferred to a 
4 mL glass vial and blown to dryness in a stream of nitrogen at 45°C To the vial 
5 was added 1 75 \xL of 0. 1 M phosphate buffer, pH 6.5. 5 M L catalase (28.700 units 
of C- 100. Sigma Chemical Co.. St. Louis. MO), and 20 jiL L-amino acid oxidase 
(Sigma A-9253, 6.5 units/mL). The vial was then placed on a shaker water bath 
set at 30°C. 60 cycles/min. for 0.5 to 1 h. The reaction mix was then passed 
through a small column containing 400 uL Dowex AG SOW X8 cation exchange 
1 0 resin. The column was then washed with 1 .5 mL of water and the eluant 

containing the labeled /?-hydroxyphenylpyruvate was collected. The labeled 
substrate was either used immediately or stored at -80°C and used within a week 
after preparation. 

The assay was performed in 14 mL culture tubes capped with serum 
1 5 stoppers through which a polypropylene well containing 200 |aL of 1 N KOH was 
suspended. The reaction mixture contained 5.740 units of catalase. 100 uL of a 
freshly prepared 1:1 (v:v) mixture of 150 mM reduced glutathione and 3 mM 
dichlorophenolindophenol. 5 mM ascorbate, 0.1 mM ferrous sulfate (the ascorbatc 
and ferrous sulfate were not present in the buffer used in the first experiment; 
20 Table 2), 50 jiM unlabeled />hydroxyphenylpyruvate. 1-25 uL of the enzyme 
extract, and 50 mM potassium phosphate buffer in a final volume of 980 |aL. 
Unlabeled substrate was made fresh daily in 50 mM potassium phosphate buffer 
and allowed to equilibrate for at least 2 h at room temperature to insure that 
greater than 95% was in the keto form. The tubes were incubated for 1 0 min at 
25 30°C in a shaking water bath prior to adding 20 uL (0.04 jiCi) of 

l4 C-/?-hydroxyphenyipyruvatc. The reaction was terminated after 60 min by 
injecting 500 \xl of 1 N sulfuric acid through the serum stopper. The vials were 
left on the shaker for another 30 min to insure complete capture of the released 
14 C0 2 . The serum caps were then removed and the wells cut and dropped into 
30 8 mL scintillation vials. Six mL of Formula-989 scintillation fluid (Packard 

Insturments, Meriden, CT) was added to the vials and the 14 C radioactivity was 
determined by scintillation counting. Table 2 summarizes the results of this 
experiment. 
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Table 2 

/7-Hydroxyphenylpyruvate Dioxygenase Activity of Extracts from 
E. coli Containing Different Piasmid Constructs 



Piasmid 


Inducer 
(1 mM 1PTG) 


Lvsate 


Ammonium Sulfate Precipitate 


dpm * /mi: 


nmol/min x ma 


dpm * /ma 


nmol/min x ma 


pET24c(+) 




12,318 


0.09 


0 


0.00 


pET24c<+) 


+ 


35,115 


0.25 


3,393 


0.03 


pE24CPI 




24,607 


0.17 


126.761 


0.89 


pE24CPl 


+ 


243,801 


1.71 


1.371.823 


9.64 


* 14 C: 12 C 


= 1 : 50; sp. act. 


of l4 C-/?-hydroxypheny (pyruvate = 5 


5 mCi/mmol 





5 

The results show there was little or no p-hydroxyphenylpyruvate 
dioxygenase activity in any of the cell cultures that did not have the piasmid 
containing the nucleic acid fragment encoding />hydroxyphenylpyruvate 
dioxygenase (pET24c(+)) and the inducer of gene expression (IPTG). The gene 

1 0 and inducer together resulted in a marked increase in activity. 

In the experiment with [U- I4 C] /7-hydroxyphenylpyruvate ( fcb HPPA"), where 
both l4 CO-> and 14 C-homogentisic acid were measured, the reaction was initiated 
by adding 50 |aL of labeled substrate (0.3 uCi) and was terminated with 100 jiL of 
10% phosphoric acid. The 14 C0 2 released was determined by scintillation 

1 5 counting, while the level of homogentisic acid was determined by HPLC on a 
Zorbax RX-C8 column (4.6 x 250 mm) with an in-line radioactivity detector. 
Aliquots of 1.7 to 15 yiL were taken from the reaction mix after centnfugation and 
diluted into the column equilibration buffer prior to injection. Separation was 
performed at ambient temperature with a flow rate of 1 .0 mL/min and the 

20 following gradient with solvent A and B being water and methanol, each with 1% 
phosphoric acid: 0-2 mm, isocratic at 95% A and 5% B: 2-17 min. linear gradient 
from 95 to 75% A and 5 to 25% B; 17-19 min linear gradient from 75 to 5% A 
and 25 to 95% B; 19-22 min. isocratic at 5% A and 95% B; 22-24 min, linear 
gradient from 5% to 95% A and 95 to 5% B. In this system homogentisate eluted 

25 at 10.8 min. The results from this experiment are shown in Table 3. 
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Table 3 

p-Hydroxyphenylpyruvate Dioxygenase Activity of Cell Extracts 
Determined by C0 2 Release and Homogentisic Acid Synthesis 
from [U- ,4 C] /?-Hydroxyphenylpyruvate 





Inducer 


nmol/min x mg* 


Plasmid 


{! mM IPTG) 


l4 CO> 


Homogentisic acid 


pET24c(+) 




0.00 


0.00 


pET24c( + ) 


4 


0.19 


0.00 


pE24CP] 




4.68 


4.76 


PE24CP1 


4- 


29.12 


29.82 


* I4 C: = 


1 : 87.7; sp. act. of 


14 C[U]-p-HPPA 


= 498 mCi /mmoi 



There was a tight correlation between the results from the assays of the two 
products of the reaction. The results confirmed there was no significant 

10 /?-hydroxyphenylpyruvate dioxygenase activity in either cell culture that did not 
contain the nucleic acid fragment encoding /?-hydroxyphenylpyruvate 
dioxygenase. There was measurcable enzyme activity in the absence of the 
inducer, but when the inducer was added the activity increased greater than six- 
fold over uninduced cultures. These results and those of Table 2 clearly show that 

1 5 the nucleic acid fragment isolated and overexpressed in E. coli cells encodes a 
protein that catalyzes the conversion of p-hydroxyphenylpyruvate to 
homogentisate with the release of C0 2 . 

The overexpressed protein was also assayed spectrophotometrically at 
ambient temperature using the enol borate-tautomerase assay (Lin. E. C. C. et al., 

20 (1958),/ Biol. Chcm. 233:668-673). The assay buffer contained 0.4 M borate 

(adjusted to pH 7.2 with 0.2 M sodium borate), 4 mM ascorbate, 2.5 mM EDTA. 
40 (iM/j-hydroxyphenylpyruvaie, and 0.5 units of tautomcrase (Sigma T-6004) 
per 10 mL buffer. The reaction mix was used when the tautomerization of the 
substrate was complete (when absorbance at 308 nm had stabilized). The assay 

25 was initiated by adding 40 |iL of the cell extracts to 960 \xL of the assay buffer, 

and the reaction was followed by measuring the decrease in absorbance at 308 nm. 
Table 4 summarizes the results with extracts of the same four cell cultures 
described in Table 3. 
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Table 4 

Spectrophotometric Assay of />Hydroxyphenylpyruvate 
Dioxygenase Activity of Cell Extracts 



Plasmid 


Inducer 
(1 mM IPTG) 


nmol p-HP lost/min x me* 


pET24c(+) 




1.58 


pET24c(+) 


+ 


2.73 


pE24CPl 




4.91 


pE24CPI 


+ 


22.32 



5 * Loss of /7-hydroxyphcnylpyruvate based on a molar extinction 

coefficient for the equilibrium mixture of 9850 as reported by 
Lin et al. ((1958)./ Biol. Cham. 233: 668-673). 

EXAMPLE 4 

10 Inhibition of p-Hvdroxvphenvlpvruvate Dioxygenase bv Commercial Herbicides 
The enzymatic activity of the overexpresscd protein is inhibited by two 
herbicides known to inhibit plant /?-hydroxyphenyIpyruvate dioxygenase: 
Sulcotrione (2-(2-chloro-4-methanesulfonylbenzoyl)-l .3-cyciohexanedione); and 
Isoxaflutole (5-cyclopropylisoxazol-4-yl 2-mesyl-4-trifluoromethylphenyl 

1 5 ketone). These two compounds were tested against the overexpressed protein 
using both the 14 C0 2 and the continuous spectrophotometric enol borate - 
tautomerase assays. Both compounds were added to the assay buffers in 10 yxL of 
acetone or dimethyl sulfoxide. The I 50 values (concentration inhibiting the 
enzyme 50%) were calculated based on the percent inhibition observed over 

20 several concentrations of the inhibitor. The results of the assays are shown in 1 
Table 5. 

Table 5 

I 50 Values of Inhibitors of Plant /7-HvdroxyphenyIpyruvate Dioxygenase 

25 





I50 value (nM) derived from 


Compound 


l^CO*) assav spectrophotometric assay 


sulcotrione 


43 44 


isoxaflutole 


409 1042 



These results clearly show that the /?-hydroxyphenylpyruvate dioxygenase 
activity of the overexpressed protein is inhibited by commercial herbicides that 
have inhibition of this enzyme as their mode of action. Moreover, the continuous 
30 spectrophotometric assay gave similar I 50 values to those obtained with the l4 C0 2 
assay. The spectrophotometric assay can be adapted to a high capacity screen for 
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inhibitors of p-hydroxyphenylpyruvate dioxygenase by adapting it to a microtiter 
plate assay combined with a plate reader that would read at or near 308 nm. 
Furthermore, any colorimetric or fluorescent assay for homogentisate or 
/>hydroxyphenylpyruvatc would also be able to be readily adapted into a high 
5 capacity screen for inhibitors of this enzyme. The isolated ovcrexprcssed enzyme 
has sufficient activity to be used directly in a spectrophotometry assay or it can be 
further purified for enhanced assay sensitivity. 

EXAMPLE 5 

Re-construction of the Ful l-length /7-Hvdroxvnhenvlpvruvate Dioxvuenasc Gene. 
10 for Production of Active. Stable Enzvme in Bacteria 

The plasmid pT7B!ueR+PDC>2. described in Example 2 and containing the 
full-length />-hydroxypheny [pyruvate dioxygenase gene, proved to have incorrect 
sequence at the EcoRl site. This was rc-sequenccd so that an oligonucleotide 
could be designed to replace the EcoRI site with an Ndcl site using conventional 

1 5 ioop-out mutagenesis. The oligonucleotide was designed so that this procedure 
also introduced an ATG initiation codon at the 5'- end of the p-hydroxyphenyl- 
pyruvate dioxygenase gene followed by the full-length /?-hydroxyphenylpyruvate 
dioxygenase sequence. After mutagenesis, the clone was amplified in E. colt and 
the plasmid was purified. The resulting full-length gene. "PDO-B*\ was then 

20 digested with the enzymes using Ndel and Nhcl, and the -820 bp fragment used to 
replace the Ndel - Nhe 1 segment of the truncated />hydroxyphcnylpyruvate 
dioxygenase gene. "PDO-A/' inpE24CPl (Example 1). The resulting plasmid. 
pE24PDO-B can be expressed in bacteria to produce the full-lmgxhArahulopsis 
/>hydroxyphenylpyruvatc dioxygenase enzyme as determined by enzyme activity 

25 and N-terminal sequence analysis. 

EXAMPLE 6 

Enhance d Stability of Full Length Construct Over the Truncated Construct 
Two different constructs for Arabidopsis (haliana /7-hydroxyphenyl- 
pyruvate dioxygenase. one containing the full-length sequence. PDO-B as 

30 described in Example 5 and produced from plasmid pE24PDO-B. and one 
containing the truncated sequence lacking the putative chloroplast leader 
sequence. PDO-A as produced from plasmid pE24CPL were both purified to the 
same extent using a Pharmacia phenyl Sepharose hydrophobic interaction column 
followed by gel filtration chromatography on Pharmacia Sephacryl 300. The two 

35 proteins were diluted to 1 mg/mL in 20 mM bis tris-propane buffer, pH 7.2 
containing 5 mM ascorbate, 1 mM reduced glutathione and 0.1 mM ferrous 
ammonium sulfate and stored in a refrigerator at 4 °C for up to 10 days. Aliquots 
were removed at various times and assayed for activity using the tautomerase 
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coupled spectrophotometric assay. Under these conditions the half-life for the 
activity of the full length enzyme was 4 days, whereas the truncated enzyme 
preparation had a half-life of 9 to 10 hours. In addition, the activity of the full 
length enzyme could be restored by incubation with iron and reducing agent. 
5 reduced glutathione or ascorbatc, or by dialysis against buffer containing iron and 
reducing agent. In contrast, the activity of the truncated enzyme could not be 
restored by incubation with or dialysis against buffer containing iron and reducing 
agent. The full-length enzyme was also more stable in the spectrophotometric 
assay showing a 2 to 3 times longer useful linear region than the truncated 

10 enzyme. Both enzyme preparations showed similar 1 50 values with the 
herbicidally active inhibitors. 

These results clearly show that the full-length PDO-B construct has 
decided advantages over the truncated enzyme due to the enhanced stability under 
storage conditions, in the spectrophotometric assay and in the reversible 

1 5 reconstitution of activity in the presence of iron and reducing agent. While both 
enzyme constructs can be used for screening of inhibitors, the PDO-B enzyme is 
preferred for this application and is far superior for mechanistic and structural 
studies. 

EXAMPLE 7 

20 Cloning of the Maize p-Hvdroxvphenvlpvruvate Dioxvgenase Gene 

Approximately 600,000 plaques of a Stratagene maize Uni-Zap cDNA 
library (from young plants) were screened by filter hybridization under moderate 
stringency using a heterologous probe. The probe was prepared by PCR and was 
a 916 bp fragment of DNA having the sequence defined by the region extending 

25 from position 263 to 1 178 of SEQ ID NO: 14. Twenty-four positive phage clones 
were identified in the primary screen, and eleven phage clones were recovered 
from a secondary screen. Seven positive clones were submitted for sequencing, 
and four showed significant conservation sequence at the amino acid level when 
compared with the Arabidopsis thaliana p-hydroxyphenylpyruvate dioxygenasc 

30 protein. The longest of the four contained an insert of 988 bp and showed 70% 
identity and 78% similarity with the Arabidopsis protein, but was lacking 
approximately 550 bp corresponding to the amino terminal end of the protein. 

Attempts to obtain a full-length cDNA of the maize /7-hydroxyphenyl- 
pyruvate dioxvgenase gene were unsuccessful, possibly because the secondary 

35 structure of the RNA inhibited efficient reverse transcription of this transcript- 
Two additional cDNA libraries were screened and clones long enough to contain a 
full-length cDNA were sequenced. All of these clones were shown to be 
chimeras. Therefore a genomic library was screened to obtain the 5 f one-third of 
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the gene. Approximately 1 million clones from a Clontech Zea mays (var. B73) 
library in the phage vector EMBL3 (whole seedlings, 2 leaf stage) were screened 
using a 415 bp EcoRI-BssHII fragment containing the 5' end of the truncated corn 
/?-hydroxyphenyIpyruvatc dioxygenase cDNA (clone HI 01 1C). Eight positive 
5 primary phage clones were plated and screened, and four secondary clones were 
picked. DNA was prepared from each using the Qiagen Lambda midi-kit. 
Restriction digests with Sail or EcoRI indicated that two clones were the same. 
DNA samples from the remaining 3 clones (1 1.1.3, 13.1.1, and 21.2.1) were 
digested with Sail. EcoRI, or Sail and EcoRI, prepared for Southern analysis, and 

1 0 probed with the full length Arabidopsis p-hydroxyphenylpyruvate dioxygenase 
gene. Two of the clones (1 1.1.3 and 13.1.1) showed sequence conservation, and 
these homologous fragments were subcloned and sequenced. Both clones 
appeared to contain the full-length gene and each contained one intron near the 3' 
end of the gene. However, there were differences between the sequences of the 

1 5 two clones indicating that they may be two different genes or one may be a 

pseudogene. The sequence of clone 1 1 . 1 .3 matched the cDNA sequence, and this 
clone was used to construct a full length /?-hydroxyphenylpyruvate dioxygenase 
coding region. 

The gene was contained on two adjacent fragments, a 3.5 kb EcoRI - Sail 

20 fragment and a 2 kb Sail fragment. Both were subcloned into pBiuescript SKII+ 
resulting in the plasmids pES 1 1 1 3 and pSal 1 1 1 1 3. pES 1 11 3 was digested with 
Spel to release approximately 2.7 kb of upstream sequence and then religated, 
resulting in a plasmid with an insert of 747 base pairs (pSPEl). pSPEl was 
digested with Sail to linearize the plasmid and ligated with the 2 kb Sail fragment 

25 from pSall 1 13. which had been released by digestion with Sail and gel purified. 
Orientation was confirmed by digestion with Spel and Bpul 1021 and the correct 
plasmid was named pi 1 13. In order to remove the intron contained in the 3* end 
of the genomic clone, the plasmid was digested with Bpul 1021 and Xhol and the 
3.9 kb fragment containing the vector and 5' part of the gene was gel purified. 

30 The corresponding 882 bp Bpul 1021-XhoI fragment from pHlOl lc (cDNA)was 
gel purified and ligated with this 3.9 kb fragment resulting in the clone pMPDO 
(ATCC 209120), which contains a 1 782 bp insert. There are 260 base pairs 
upstream of the putative ATG and 1 89 base pairs downstream of the stop codon. 
The full-length sequence was confirmed by sequencing across the insert. The 

35 nucleic acid sequence and the deduced protein sequence for corn 

/7-hydroxyphenylpyruvate dioxygenase are presented in SEQ ID NOS: 10 and 1 1. 
respectively. The sequences for /7-hydroxyphenylpyruvate dioxygenases obtained 
from corn and Arabidopsis were compared using the t4 Gap" program of GCG 

32 




WO 97/49816 PCT/US97/1 1295 

(Program Manual for the Wisconsin Package, Version 9.0-OpenVMS. December 
1996. Genetics Computer Group, 575 Science Drive. Madison. WI, USA 5371 1). 
The results of these comparisons indicated that these functions are approximately 
67% identical at the nucleotide level, and they possess 69% similarity and 62% 
5 identity at the amino acid level. The predicted amino acid sequence of corn 
/?-hydroxyphenylpyruvate dioxygenase is compared with that from Arahidopsis 
and other eukaryotes in Figure 3. 

EXAMPLE 8 

Composition of a cDNA Library: Isolation and Sequencing of cDNA Clones 
10 A cDNA library representing mRNAs from developing seeds of Vernonia 

galamenensis that had just begun production of vernolic acid was prepared. The 
library was prepared in a Uni-ZAP™ XR vector according to the manufacturer's 
protocol (Stratagene Cloning Systems, La Jolla, CA). Conversion of the 
Uni-ZAP™ XR library into a plasmid library was accomplished according to the 
15 protocol provided by Stratagene. Upon conversion. cDNA inserts were contained 
in the plasmid vector pBluescript. cDNA inserts from randomly picked bacterial 
colonies containing recombinant pBluescript plasmids were amplified via 
polymerase chain reaction using primers specific for vector sequences flanking 
the inserted cDNA sequences. Amplified insert DNAs were sequenced in dye- 
20 primer sequencing reactions to generate partial cDNA sequences (expressed 

sequence tags or "ESTs"; see Adams, M. D. et aL. (1991) Science 252:1651). The 
resulting ESTs were analyzed using a Perkin Elmer Model 377 fluorescent 
sequencer. 

EXAMPLE 9 

25 Identification and Characterization of cDNA Clones 

ESTs encoding Vernonia galamenensis enzymes were identified by 
conducting BLAST (Basic Local Alignment Search Tool; AltschuL S. F. et aL. 
(1993) J. Mol. Biol. 215:403-410; see also www.ncbi.nlm.nih.gov/BLAST/) 
searches for similarity to sequences contained in the BLAST "nr" database 

30 (comprising all non-redundant GenBank CDS translations, sequences derived 

from the 3-dimensional structure Brookhaven Protein Data Bank, the last major 
release of the SWISS-PROT protein sequence database. EMBL. and DDBJ 
databases). The cDNA sequences obtained in Example 9 were analyzed for 
similarity to all publicly available DNA sequences contained in the "nr" database 

35 using the BLASTN algorithm provided by the National Center for Biotechnology 
Information fNCBI). The DNA sequences were translated in all reading frames 
and compared for similarity to all publicly available protein sequences contained 
in the "nr" database using the BLASTX algorithm (Gish. W. and States. D. J. 
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(1993) Nature Genetics 3:266-272) provided by the NCBL For convenience, the 
P-value (probability) of observing a match of a cDNA sequence to a sequence 
contained in the searched databases merely by chance as calculated by BLAST are 
reported herein as li pLog" values, which represent the negative of the logarithm of 
5 the reported P-value. Accordingly, the greater the pLog value, the greater the 
likelihood that the cDNA sequence and the BLAST "hit" represent homologous 
proteins. 

The BLASTX search using clone vsl.pk0015.b2 revealed similarity of the 
protein encoded by the cDNA to a number of /?-hydroxyphenyipyruvate 

1 0 dioxygenases from sources other that plants. The three most similar p-hydroxy- 
phenylpyruvate dioxygenase proteins were a streptomycete /7-hydroxyphenvl- 
pyruvate dioxygenase (GenBank Accession No. Ul 1864; pLog = 8.34), a rat 
/>hydroxyphenylpyruvate dioxygenase (GenBank Accession No. Ml 8405; 
pLog = 7.66), and a human /?-hydroxyphenylpyruvate dioxygenase (GenBank 

1 5 Accession No. U29895: pLog - 7.60). SEQ ID NO: 1 6 shows the nucleotide 
sequence of a portion of the Vemoma galametwnsis cDNA in clone 
vsl .pkOOl 5.b2. Sequence alignments and BLAST scores and probabilities 
indicate that the instant nucleic acid fragment encodes a portion of Vemoma 
galamenensis />hydroxyphenylpyruvatc dioxygenase. 

20 
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SEQUENCE LISTING 



(I) 



GENERAL INFORMATION: 



(i) APPLICANT: 

(A} NAME: E. I. DUPONT DE NEMOURS AMD COMPANY 

(B) STREET: 1007 MARKET STREET 

(C) CITY: WILMINGTON 

(D) STATE: DELAWARE 

(E) COUNTRY : U.S.A. 

(F) POSTAL CODE [ZIV): 1969B 
(GJ TELEPHONE: 3 0 ,: - o 92 - 8 1 1 2 

(H) telefax: 302-7"-oi6i 

(I) TELEX: 6717325 

(ii) TITLE OF INVENTION: PLANT GENE FOP p-HYDROXY- 



(iiii NUMBER OF SEQUENCES: 16 

<ivj COMPUTER READABLE FORM : 

(A) MEDIUM TYPE: DISKETTE, 3.50 INCH 
{ B } COMPUTER: IBM PC COMPATIBLE 

iC) OPERATING SYSTEM : ' MICROSOFT WORD FOR WINDOWS 
( D ) SOFTWARE: MICROSOFT WORD VERSION 1 . OA 

(v) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
•B) FILING DATE: 
:C) CLASSIFICATION: 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: C. 0/0 .7. 1 , 3 6 '1 

(B) FILING DATE: JUNE 2'i, 1996 

(vi-) ATTORNEY /AGENT INFORMATION : 

(A) NAME: FLOYD , LINDA AXAMETHY 
(Bj REGISTRATION NUMBER: 33,692 
(Cj REFERENCE/ DOCKET NUMBER: BA-9120 
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(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

iAl LENGTH: 233 base pairs 
(B) TYPE: nucleic acid 
;--) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

CAAGAAACGN GTCGNCGACG TGCTCAGCGA TGATCAGATC AAGGAGTGTG AGGAATTAGG 6 0 

GATTCTTNTA G AC AG AG ATS ATCAAGGGAC GTTNCTTCAA ATCTNCACAA AACCACTAGG 12C 

TGACAGGCCG ACGNTATTTA TAGAGATAAT CCAGAGNGTA G GAT G CATC A T G AAA G A T G T 13 0 

GGAAGGGANG GCTTACCAGA GTGGAGNATN TNGTCGTTTT CGCAAAGGCA ATT 2 33 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i; SEQUENCE CHARACTERISTICS : 

:A) LENGTH: 14 48 base pairs 
iB) TYPE: nucleic acid* 
:•?.) S T RAN D E DN ESS : :smqie 
:D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

;A} NAME /KEY : CDS 

(B) LOCATION: 9. . 1343 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

TGAAATCA ATG GGC CAC CAA AAC GCC GCC GTT TCA GAG AAT CAA AAC CAT 5 0 

Met Gly His Gin Asn Ala Ala Val 5c r Giu Asn Gin Asn His 
IS 10 

GAT GAC GGC GCT GCG TCG TCG CCG GGA TTC AAG CTC GTC GGA TTT TCC ?S 
Asp Asp Gly Ala Ala Ser Ser Pro Giv Phe Lys Leu Val Glv Phe Ser 
15 20 25 ' 30 

AAG TTC GTA AGA AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT AAG CGC 14b 
Lys Phe Val Arq Lys Asn Pro Lys Ser Asp Lvs Phe Lvs Val Lvs Arg 
3:- 4 0 ' 4 5 

TTC CAT CAC ATC GAG TTC TGG TGC GGG GAC CCA ACC AAC GTC GCT CGT 194 
Phe His Kis He Giu Phe Trp Cys Giy Asp Ala Thr Asn Val Ala Arc 
50 55 60 

CGC TTC TCC TGG GGT CTG GGG ATG AGA TTC TCC GCC AAA TCC GAT CTT 24 2 ' 

Arg Phe Ser Trp Gly Leu Gly Met Arg Phe Ser Ala Lys Ser Asp Leu 
65 70 75 

TCC ACC GGA AAC ATG GTT CAC GCC TCT TAG CTA CTC ACC TCC GGT GAA 2 90 

Ser Thr Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser Gly GIu 
80 85 90 

CTC CCA TTC CTT TTC ACT GCT CCT TAG TCT CCG TCT CTC TCC GGC GGA 33fi 
Leu Arg Phe Leu Phe Thr Ala Pro Tyr Ser Pro Scr Leu Ser Gly Gly 
95 100 105 110 
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GAG ATT AAA CCG ACA ACC ACA GGT TCT ATC CCA AGT TTC GAT CAC GGG 38 6 

Glu lie Lys Pro Thr Thr Thr Glv Ser lie Pro Ser Phe Asp His Gly 
115 120 125 

TCT TCT CGG TCC TTC TTC TCT TCA CAT GGT CTC GGT GTT AG A CCC GTT 4 3 4 

Ser Cys Arg Ser Phe Phe Scr Ser His Glv Leu Gly Val Arq Pro Vai 
130 135 i40 

GCG ATT GAA GTA GAA GAC GCG GAG TCA GCT TTC TCC ATC AGT GTA GCT 4 82 

Ala lie Glu Vai Glu Asp Ala Glu Ser Ala Phe Ser lie Ser Val Ala 
145 150 155 

AAT GGC GCT ATT CCT TCG TCG CCT CCT ATC GTC CTC AAT GAA GCA GTT 530 
Asn Glv Ala lie Pro Ser Ser Pro Pro lie Val Leu Asn Glu Aia Val 
160 1&5 1*70 

ACG ATC GCT GAG GTT .AAA CTA TAC GGC GAT GTT GTT CTC CGA TAT GTT 57 8 

Thr lie Ala Giu Val Lys Leu Tyr Glv Asp Val Val Leu Arq Tyr Val 
175 180 185 190 

AGT TAC AAA GTA GAA GAT ACC GAA AAA TCC GAA TTC TTG CCA GGG TTC 62 6 

Ser Tvr Lys Ala Glu Asp Thr Glu Lys Ser Giu Phe Lou Pro Gly Phe 
155 200 205 

GAG CCT GTA GAG GAT GCG TCG TCG TTC CCA TTG GAT TAT GGT ATC CGG 67 A 

Glu Ara Val Giu Aso Aia Ser Ser Phe Pro Leu Asp Tyr Gly lie Arg 
210 " 215 220 

CGG CTT GAC CAC GCC GTG GGA AAC GTT CCT GAG CTT GGT CCG GCT TTA 7 22 

Arg Leu Asp His Ala Val Gly Asn Val Pro Glu Leu Gly Pro Ala Leu 
'225 230 235 

ACT TAT GTA GTG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG TTC ACA 7 7Q 

Thr Tyr Val Aia Glv Phe Thr Gly Phe His Gin Phe Ala Glu Phe Thr 
240 J 245 250 

GCA GAT GAC CTT GGA ACC GCC GAG AGC GGT TTA AAT TCA GCG GTC CTG 818 
Ala Aso Asn Val Cly Thr Aia Giu Ser Giy Leu Asn Ser Aia Val Leu 
255 * ' 260 265 270 

GCT AG C AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CCA CTG CAC 8 66 

Ala Ser Asn Aso Giu Met Val Leu Leu Pro lie Asn Giu Pro Val His 
275 280 28 5 

GGA ACA AAG AGG AAG AGT CAG ATT CAG ACG TAT TTG GAA CAT AAC GAA 914 
Glv Thr Lys Arg Lys Ser Gin lie Gin Thr Tyr Leu Glu His Asn Giu 
290 295 300 

GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA GAC ATA TTC AGG 962 
Gly Ala Gly Leu Gin His Leu Ala Leu Met Ser Glu Asp lie Phe Arg 
305 310 315 

ACC CTG AGA GAG ATG AGG AAG AGG AGC AGT ATT GGA GGA TTC GAC TTC 1010 
Thr Leu Arg Giu Met Ara Lys Arg Ser Ser lie Gly Gly Phe Asp Phe 
320 ' 325 330 

ATG CCT TCT CCT CCG CCT ACT TAC TAC CAG AAT CTC AAG AAA CGG GTC 1058 
Met Pro Ser Pro Pro Pro Thr Tyr Tvr Gin Asn Leu Lys Lys Arg Val 
335 340 345 350 

GGC GAC GTG TTC AGC GAT GAT CAG ATC AAG GAG TGT GAG GAA TTA GGG 1106 
Gly Aso Val leu Ser Asp Asp Gin lie Lys Glu Cys Glu Glu Leu Gly 
355 360 365 
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ATT CTT GTA GAC AG A GAT GAT CAA GGG ACG TTG CTT CAA ATC TTC AC A 115-1 

lie Leu Val Asp Arg Asp Asp Gin Gly Thr Leu Leu Gin Hp Ph^ ~hr 

370 375 330 



1202 



1298 



AAA CCA CTA GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC CAG AG A 
Lys Pro Leu Giy Asp Arg Pro Thr lie Phe He Giu lie He Gi: A*-q 
385 390 395 

GTA GGA TGC ATG ATG AAA GAT GAG GAA GGG AAG GOT TAC CAG AG™ " £ i?sq 
Val Gly Cys Met Met Lys Asp Giu Glu Gly Lys Ala Tyr Ci'i Se>- ~lv 
400 405 410 ~ * 

GGA TGT GGT GGT TTT GCC AAA GGC AAT TTC TCT GAG CTC TTC AAG ~CC 
Gly Cys Gly Gly Phe Ala Lys Gly Asn Phe Ser Giu Leu Ph-- Lv- ier 
415 420 425 ^ V 3 0 

ATT GAA GAA TAC GAA AAG ACT CTT GAA GCC .AAA CAG TTA GTG GGA '24 3 

lie Glu Glu Tyr Glu Lys Thr Leu Glu Ala Lys Gin Leu Vui Glv 
435 440 445 

TGAACAAGAA G AAG AAC C AA CTAAAGGATT CTGTAATTAA TGTAAAACTG TTT7A7CTTA 14 0^ 

T C AAAAC AAT GTATACAACA TCTCATTTAA AAACGAGATC AATCC 1448 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 4 5 amino acids 
(3) TYPE: amino acid 
O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Gly His Gin Asn Ala Ala Val Sor Glu Asn Gin Asn - * ^s^ 
1 5 10 T 5 ' " 

Gly Ala Ala Ser Ser Pro Gly Phe Lvs Leu Val Glv °^r> --^ - * yc D he 
20 ^5 " " ~'" 

Val Arg Lys Asn Pro Lys Ser Asp Lys Phe Lvs Val Lys Ai .: Phe his 
3 b 40 * 4 e " ' ' ~ 

His He Giu Phe Trp Cys Gly Asp Ala Thr Asn Val Ala A— Ara Phe 
50 55 60 

Ser Trp Gly Leu Gly Met Arg Phe Ser Ala Lvs Ser Aso he- Ser T-r 
65 7 0 75 " 80 

Gly Asn Met Val His Ala Ser Tvr Leu Leu Thr Ser Gly G 1 - Leu Ara 
35 90 95 

Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser Gly Glv Glu He 
100 105 " HQ 

Lys Pro Thr Thr Thr Gly Ser He Pro Ser Phe Asp His Gly Ser Cys 
115 120 125 

Arg Ser Phe Phe Ser Ser His Gly Leu Gly Val Ara Pro Va - Ala He 
!30 135 140 

Glu Val Glu Asp Ala Glu Ser Ala Phe Scr lie Ser Vai Pi a Asn Glv 
145 150 155 i6 o 
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Ala T l<- Pro Ser Ser Pro Pro lie Vai Leu Asn Giu Ala Val Thr ll- 
165 17:- 175 

Ala Glu Val Lys Leu Tyr GLy Asp Va J Va.l Leu Arg Tyr Vai Ser Tyr 
180 135 190 

Lvs Ala Glu Asp Thr Glu Lys Ser Glu Phe Leu Pro Ciy Phe Glu Arq 

195 200 205 

Val Giu Asd Ala Ser 5er Phe Pro Leu Asp Tyr Giy lie Arq Arq Leu 
210 * 215 220 



vr 



Asp His Ala Val Giv Asn Vai Pro Glu Lou G!y Pro Ala Lou Th 
225 ' 230 235 240 

Val Aia Giv Phe Thr Giy Phe His Gin Phe Ala Glu Phe Thr Ala Asp 
245 250 25^ 

Asp Vai Giv Thr Ala Glu Ser Giy Leu Asn Ser Aia Vai Lou Aia Ser 
" 260 265 270 

Asn Aso Giu Her. Val Leu Leu Pro lie Asn Giu Pro Val His G \ y Thr 
275 280 -'35 

Lys Ara Lvs Ser Gin Tie Gin Thr Tyr Lou Glu His Asn Glu Giy Ala 
2 90 * 2 95 

Giv Leu Gin His Leu Aia Leu Ko- ^ Asp lie Phe Arq Thr Leu 

305 310 31b 3z0 

Arg Glu Met Ara Lvs Arq Ser Ser lie Giy Giy Phe Asp Phe Met Pre 
325 330 J->5 

Se- ^-o Pro Pre Thr Tvr Tvr Gin Asn Leu Lys Lys Ara Val Giy Asp 
340 " ' 345 350 

Val Leu Ser As P Aso Gin lie Lys Giu Cys Giu Glu Leu Giy lie Leu 
355 360 365 

Val Asd Arq Asp Asp Gin Giv Thr Lou Leu Gin lie Phe Thr Lys Pro 
37b * ' 375 380 

Leu Giv Aso Arc Pro Thr He Phe ii- Glu lie Tie Gin Arq vai Giy 
335 ' 390 395 400 

Gys Met Met Lys Asp Giu Glu Giy Lys Aia Tyr Gin Ser Giy Giy Cys 
405 'HO 415 

Giv Gi- Phe Ala Lys Giv Asn Phe Ser Glu Leu Phe Lys Ser lie Giu 
420 " 425 430 

Giu Tvr Glu Lys Thr Leu Glu Aia Lys Gin Leu Val Giy 
435 440 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH : 5 3 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

•ii) MOLECULE TYPE: DNA (genomic ) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : A : 

TATGTCCAAG TTCGTAAGAA AGAATCCAAA GTCTGATAAA TTCAAGGTTA AGC 5 3 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS- 

(A! LENGTH: 51 base pairs 
(E) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NC : 5 : 

GCTTAACCTT GAATTTATCA GACTTTGGAT TCTTTCTTAC GAACTTGGAC A 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A; LENGTH: 392 amino acids 
(B! TYPE: amino acid 
(C; STRANDEDNESS: sir.q'^ 
(D: TOPOLOGY: linear 

(ii! MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Thr Ser Tyr Ser Asp Lys Gly Gk Lys Pro Giu Arc, Giy Ara Phe Leu 

His Phe His Ser 7a 1 Thr Phe Trp Val Giy Asn Ala Lys Gir. Ala Ala 
20 25 30 

Ser Tyr Tyr Cys Ser Lys l ie Gly phe Glu PrQ ^ Ala Tyr Lvs Glv 
35 40 45 

Leu Glu Thr Gly Ser Arg Glu Vai Vai S or His Val Val Lys Gin Asp 
su 55 ,5Q 

Lys lie Val Phe Val Phe Ser Ser Ala Leu Asn Pro Tro A<m Lys G ■ u 
" 70 "'5 " 80 

Met Gly Asp His Leu Val Lys His Gly Asp Gly Val Lys Asp Ii~ Ala 
83 SO ' 95 

Phe Glu Val Glu Asp Cys Asp Tyr lie Vai dr. Lvs Ala Ara Glu Arq 
!00 105 no 

Gly Ala lie lie Val Arq Glu Glu Val Cys Cys Ala Ala Asd Val Arg 
115 120 125 " 

Gly His His Thr Pro Leu Asp Arg Ala Arq Gin Val Tro Giu Glv Thr 
130 135 140 

Leu Val Glu Lys Met Thr Phe Cys Leu Asp Ser Ara Pro Gin Pro Ser 
S 150 155 160 

Gin Thr Lou Leu His Arg Leu Leu Leu Ser Lvs Leu Pro Lvs Cys Glv 
165 170 ' 175 

Leu Glu He lie Asp His He Val Giy Asn Gin Pro Asp Gin Glu Met 
180 185 190 
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Giu Ser Ala Ser Gin Trp Tyr Met Arg Asn Leu Gin Phe His Arg Phe 
195 200 205 

Trp Ser Val Asp Asp Thr Gin He His Thr Giu Tyr 5or hi a Leu Arg 
210 215 220 

Ser Val Val Met Ala Asn Tyr Giu Giu Ser lie Lys Ket Pro lie Asn 
225 230 " 235 240 

Giu Pro Ala Pro Gly Lys Lys Lvs Ser Gin He Gin Giu Tyr Val Asp 
245 250 255 

Tvr Asn Giv Gly Ala Gly Val Gin His He Ala Leu Lys Thr Giu Asp 
260 265 270 

He lie Thr Ala lie Arg Ser Leu Arg Giu Arg Gly Va. Giu Phe Leu 
275 280 2S5 

Ala Val Pro Phe Thr Tyr Tyr Lys Gin Leu Gin Giu Lys Leu Lys Ser 
290 295 2C0 

Ala Lys He Arg Val Lys Giu Ser Tie Asp Val Lou GH Giu Leu Lys 
305 310 315 320 

He Leu Val Asd Tvr Asd Giu Lys Giv Tyr Leu Leu Gin lie Phe Thr 
325 ' 330 335 

Lys Pro Met Gin Asp Arg Pro Thr Vai Phe Leu Giu Val lie Gin Arg 
340 " " 345 350 

Asn Asn His Gin Gly Phe Giv Ala Giv Asn Phe Asn Ser Leu Phe Lvs 
355 * 360 365 

Ala Phe Giu Giu Giu Gin Giu Leu Arg Gly Asn Leu Thr Asp Thr Asp 
370 375 3SC 

Pro Asn Gly Val Pro Phe Arg Leu 
3S5 390 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i.) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 392 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Thr Ser Tvr Ser Asp Lys Gly Giu Lys Pro Giu Arg Gly Arg Phe Leu 
1*5 10 15 

His Phe His Ser Val Thr Phe Trp Val Gly Asn Ala Lvs Gin Ala Ala 
20 25 30 

Ser Tyr Tyr Cys Ser Lys lie Gly Phe Giu Pro Leu Ala Tyr Lys Gly 
3 5 4 0 4 5 

Leu Giu Thr Gly Ser Arg Giu Val Val Ser His Val Val Lys Gin Asp 
50 55 60 

Lvs lie Val Phe Val Phe Ser Ser Ala Leu Asn Pro Trp Asn Lys Giu 
65 • 70 75 80 
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Met Gi y Asp His Leu V-a 1 Lys His GI v Asn C.i. y V\i i Lvs Asp I I o A.i a 
8 5 * " 00* ' * 95 

Phe Glu Vai Giu Aso Cys Asp Tyr lie Va 1 Girt Lvs Ala Arq Glu Arg 
100 105 ' 11C 

Gly Aia lie lie- Val Ara Giu Giu Val Cys Cvs Ala Ala Aso Val Arc 
115 ' 120 ' ' 125 

Gly His His Thr Pro Leu Asp Arq Aia Arq Gin Val Tro Giu Glv Thr 
i30 135 140 

Leu Va i Glu Lys Met Thr Phe Cys Leu Asp Ser Ara Pre Gin Pro Ser 
14 5 150 * 155 160 

Gir. Thr Leu Leu His Arq Leu Leu Leu Ser Lys Leu Pro Lys Cvs Giy 
.16 5 17 0 1 7 5 

Leu Giu lie lie Asp His lie Val Gly Asn Gin Pro Asp Gin Glu Met 
18C 1S5 190 

Giu Ser Ala Ser Gin Trp Tyr Met Arq Asn Leu Gin The ills Ara Fhe 
195 ' 200 ' 205 

Trp Ser Val Asp Asp Thr G.I r. He His Thr Giu Tvr S^i Aia Leu Ara 
210 215 220 

Ser Val Val Met Aia Asn Tyr Glu Glu Sor He Lys Met Pro lie- Asn 
^25 2 30 2 35 24 0 

Glu Pro Aia Pro Gly Lys Lys Lys Ser Gin lie Gin Glu Tvr Vui Aso 
245 250 ' 255 

Tyr Asn Gly Gly Ala Giv Val Gin His Ho Ala Leu Lvs Thr Giu Asp 
260 265 - 270 

He lie Thr Aia lie Arq Ser Leu Arq Giu Arq Gly Val Glu Phe Leu 
275 230 285 

Ala Val Pro Phe Thr Tyr Tyr Lvs Gin Leu Gin Glu Lvs Lou Lys Ser 
290 295 300 

Ala Lys lie Arq Val Lys Giu Ser lie Aso Va ! Lou G 1 u Glu Leu Lvs 
305 310 ' 315 320 

lie Leu Val Asp Tyr Asp Giu Lys Glv Tyr Lou Lou Gin lie Phe Thr 

325 330 335 ' 

Lys Pro Met Gin Asp Arq Pro Thr Val Phe Leu Giu Vai He Gin Arg 
340 ' 345 350 

Asn Asn His Gin Glv Phe Giy Aia Glv Asn Phe Asn Ser Leu Phe Lys 
355 360 ' 365 

Ala Phe Glu Glu Glu Gin Giu Leu Arq Giv Asn Leu Thr Aso Thr Aso 
370 375 * 380 

Pro Asn Gly Vai Pre Phe Arg Leu 
385 390 

[2) INFORMATION FOR SEQ ID NO : 8 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 392 ammo acicis 

(B) TYPE: amino acid 
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(C) 5TRANDEDNES3: single 

(D) TOPOLOGY : linear 

<ii, MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION : SZQ ID NO: 8: 

Thr Thr Tyr Asn Asn Lvs Glv L J ro Lvs Pro GIu Arc Glv Arq Phe Leu 
1 5 ' * 13 15 

His Phe His Ser Val Thr Phe Trp Val Gly Asn Ala Lys Gin Ala Ala 
20 25 30 

Ser Phe Tyr Cvs Asn Lv3 Met; Glv Phe Glu Pre Leu Ala Tyr Arq GJ y 
3 5 * 4 0 -15 

Leu Glu Thr Gly Ser Arq Glu Val v.;i Ser iiis Val lie Lys Arq Gly 

SO 5 5 GC 

Lys lie Val Phe Val Leu Cys Ser Ala Leu Asn Pro Trp Asn Lys Glu 
65 70 75 80 

Met Glv Asp His Leu Val Lvs iiis Gly Asd Gly Val Lys Asp lie Ala 
8 5 " 00' 95 

Phe Glu Val Glu Asp Cys Aso His ile Va ! Gin Lys Ala Arg Glu Arg 
100 ' 105 110 

Gly Ala Lys lie Val Arq Glu Pre Trp Val Glu Gin Asp Lys Phe Gly 
115 120 125 

Lys Val Lys Phe Ala Val Leu Gin Thr Tyr Gly Asp Thr The His Thr 
130 135 140 

Leu Val Glu Lys lie Asn Tvr Thr Gly Arg Phe Leu Pro Giy Phe Glu 
14 5 150 155 160 

Ala Pro Thr Tvr Lvs Asp Thr Leu Leu Pro Lvs Leu Pro Arg Cys Asn 
165 170 175 

Leu Glu Ile lie Asp His lie Val Glv Asn Gin Pro Asp Gin Glu Met 
ISO ' 190 

Gin Ser Ala Ser Glu Tro Tvr Leu Lvs Asn Leu Gin Phe His Arg Phe 
195 ' 200 205 

Trp Ser Val Asp Asp Thr Gin Val His Thr Glu Tyr Ser Ser Leu Arg 
210 215 220 

Ser lie Val Val Thr Asn Tyr Glu Giu Ser Ile Lys Met Pro Ile Asn 
225 230 235 240 

Glu Pro Ala Pro Glv Arg Lys Lys Ser Gin lie Gin Glu Tyr Val Asp 
245 250 255 

Tyr Asn Gly Glv Ala Giy Val Gin His lie Ala Leu Lys Thr Glu Asp 
260 265 270 

lie Ile Thr Ala Ile Arg His Leu Arg Glu Arq Giy Thr Glu Phe Leu 
275 " 280 285 

Ala Ala Pro Ser Ser Tyr Tyr Lys Leu Leu Arg Glu Asn Leu Lys Ser 
290 295 300 

Ala Lys Ile Gin Val Lys Giu Ser Met Asp Val Leu Glu Glu Leu His 

305 310' 315 320 
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lie Leu Val Asp Tyr Asp Glu Lys Gly Tyr Leu Leu Gin lie Phe Thr 
325 ' 330 335 

Lys Pro Met Gin Asp Arg Pre Thr Leu Fhe Leu Glu Val lie Gin. Arg 
340 345 350 

His Asn Eiis Gin Gly Phe Gly Ala Gly Asa Phe Asn Ser Leu Phe Lvs 
355 360 365 

Ala Phe Glu Glu Glu Gin Ala Leu Ara Glv Asn Leu Thr Asp Leu G'u 
370 375 * 380 

Pro Asn Gly Val Arg Ser Glv Met: 
385 390 

(2) INFORMATION FOR SEQ TD NO : 9 : 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 376 amine acids 
! 3* } TYPE: am no acid 
;C; STRANDEDME3S : srnqle 
TOPOLOGY: linear 

(ii) V-OLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: :\ZQ ID NO : 9 : 

Tyr Trp Asp Lys Gly Pro Lvs Pro Giu Arq Gly Ara Phe Leu His Phe 
1 5 10' " 15 

His Ser Val Thr Phe Trp Val Gly Asn Ala Lys Gin Ala Ala Ser Phe 
20 25 30 

Tyr Cys Asn Lys Met Gly Phe Glu Pro Leu Ala Tvr Lvs Glv Leu Glu 
35 4 0 4 5 

Thr Gly Ser Arg Giu Val Val Ser His Val Il« Lys Gin Gly Lys lie 
50 55 60 

Val Phe Val Leu Cys Ser Ala Leu Asn Pro Tro Asn Lvs Giu Met Gly 
6 5 7 0 7 5* " 3 0 ' 

Asp His Leu Vai Lys His Gly Asp Gly Val Lys Asp Tic- Ala Phe Glu 
85 * QQ 95 

Val Glu Asp Cys Glu His He Val Gin Lys Ala Ara Glu Arg Gly Ala 
100 105 " 110 

Lys He Val Arc Glu Pro Trp Val Glu Glu Asp Lys Phe Giv Lys Vai 
115 120 " 125 

Lys Phe Ala Vai Leu Gin Thr Tyr Gly Asp Thr Thr His Thr Leu Val 
130 135 140 

Glu Lys He Asn Tyr Thr Gly Arg Phe Leu Pro Gly Phe Glu Ala Pre 
145 150 155 160 

Thr Tyr Lys Asp Thr Leu Leu Pro Lys Leu Pro Ser Cys Asn Leu Glu 
165 170 175 

He He Asp His lie Val Gly Asn Gin Pro Asp Gin Giu Met Glu Ser 
180 185 190 

Ala Ser Glu Trp Tyr Leu Lys Asn Leu Gin Phe His Ara Phe Trp Ser 
195 200 205 
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Val Asp Asp Thr Gin Vai His Thr Glu Tyr Ser Ser Leu Arg Ser lie 
210 215 220 

*-'al Val Ala Asn Tyr Glu Glu Ser He Lys Met Pro lie Asn Glu Pro 
225 230 235 ■ 240 

-la Pr^ Gl" *\rg Lys Lvs Ser Gin lie Gin Glu Tyr Vai Asp Tyr Asn 
245 250 255 

GW Gly Ala Glv Vai Gin His lie Ala Leu Arg Thr Glu Asp lie lie 
260 265 270 

--r Thr lie Arc His Leu Arg Glu Arg Gly Met Glu Phe Leu Ala Vai 
275 " 230 285 

r; ro ser Ser Tvr Tyr Arq Leu Leu Arg Giu Asn Leu Lys Thr Ser Lys 
290 ' ' 295 300 

T ". =■ Gin Val Lys Giu Asn Met Asp Vai Leu Glu Glu Leu Lys lie Leu 
355 310 315 320 

'-a! Asp Tvr As D Glu Lvs Gl v Tyr Leu Leu Gin lie Phe Thr Lys Pro 
* 325 330 335 

*---t- G-n A = o A-q Fto Thr Leu Phe Leu Giu Vai He Gin Arg liis Asn 
340 345 350 

'--is Gin GJv Phe Gly Ala Gly Asn Phe Asn Ser Leu Phe Lys Ala Phe 
355 360 365 

Giu Glu Giu Gin Ala Leu Arg Gly 
370 375 

{2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

{A) LENGTH: 1766 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(i- j HYPOTHETICAL: NO 

(ivj ANTI-SENSE: MO 

(vi) ORIGINAL SOURCE : 

(A} ORGANISM: Zea mays 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 261.. 1595 

(xil SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
ACTAGTTGTG AGAGCCTTCT GCGTTGGCAA TTGGCAGTAC AAGACAAATC ACATCCGCAA 60 
CCCCAACCAC AGAATCGTCC GTCCACGTGG CCCCCATCAC TTCCCTTTAT TTACCAGTCG 120 
TCCCCCATCC CCAGGGCCAC CCACCAACAA GTGCAGTCAC CCGAGCCGCA AACTGCAGCT 180 
CTGCAAGCTA CAGAGGCCAC CACGAGTCCA CGACGCCACG CCCTCCGAGA GAAAGAGAAA 24 0 
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GAGAAAACCA AA GCACGATA ATG CCG CCG ACC CCC ACA GCC GCC GCA GCC 2 90 

Met Pro Pro Thr Pro Thr Ala Ala Ala Ala 
1 5 io 

GGC GCC GCC GTG GCG GCG GCA TCA GCA GCG GAG CAA GCG GCG TTC ^G~ ^30 
Gly Ala Ala Val Ala Ala Ala Ser Ala Ala Glu Gin Ala Ala Phe P^a 
15 20 25 

CTC GTG GGC CAC CGC AAC TTC GTC CGC TTC AAC CCG CGC TCC GAC C^r 33 6 
Leu Val Gly His Arg Asn Phe Val Arq Phe Asn. Pro Arg Ser Asp Arg 
30 35 40 

TTC CAC ACG CTC GCG TTC CAC CAC GTG GAG CTC TGG TCC GCC GAC GCr- 4 34 
Pne His Thr Leu Ala Phe His His Val Glu Leu Trp Cvs Ala Aso Ala 
4 5 so 55 - 

CCC TCC GCC GCG GGC CGC TTC TCC TTC GGC CTG GGC GCG C^G CT C G^ 4 8^ 
Hla Ser Ala Ala Gly Arg Phe Ser Phe Gly Leu Gly Aia Pro L«u Ala 
60 65 70 

GCA CGC TCC GAC CTC TCC ACG GGC AAC TCC GCG CAC GCG TC^ r-r G rT ,~ -30 
Ala Arg Ser Asp Leu Ser Thr Gly Asn Ser Aia Hi.s Aia Ser Leu Leu 

2 0 8 5 90 

CTC CGC TCC GGC TCC CTC TCC TTC CTC TTC ACG GCG CCC TAG GCG CA^* '■•in 
Leu Arq Ser Gly Ser Lou Ser Phe Leu Phe Thr Ala Pro Tvr Aia Hi* 
95 100 " 105 

GGC GCC GAC GCT GCC ACC GCC GCG CTG CCC TCC TTP TC" GCr GCC GC^ 
Gly Ala Asp Ala Ala Thr Aia Ala Leu Pro Ser Phe Ser Ala /via Ala 
110 115 120 

GCG CGG CGC TTC GCA GCC GAC CAC GGC CTC GCG GTG CGC GCC GTC GC" 
Ala Arg Arg Phe Ala Ala Asp His Glv Leu A.U Val Arg Ala Val Ala 
125 130 135 

CTC CGC GTC GCC GAC GCC GAG GAC GCC TTC CGC GCC AGC GTC GCG GCC 
Leu Arg Val Ala Asp Ala Glu Asd Ala Phe Aro Aia Ser Val Ma Ala 
140 14 5 150 

GGG GCG CGC CCG GCG TTC GGC CCC GTC GAC CTC CGC CGC GGC TTC r G C 
G:.y Ala Arq Pro Aia Phe Gly Pro Val Aso Leu C.= v Aru Hlv ^ ro 

CTC GCC GAG GTC GAG CTC TAC GGC GAC GTC GTG CTC CGG TAC GTG AGC 8 18 
Leu Ala Glu Val Glu Leu Tyr Gly Asp Val Val Leu Arq Tyr Val Ser 
175 180 " 185 

TAC CCG GAC GGC GCC GCG GGC GAG CCC TTC CTG CCG GGG TTC GAG GGC 066 
Tyr Pro Asp Gly Ala Ala Gly Glu Pro Phe Leu Pro Gly Phe Glu ni v 
i90 195 200 

GTG GCC AGC CCC GGG GCG GCC GAC TAC GGG CTG AGC AGG TTC GAC CAC <H4 
Vax Ala Ser Pro Gly Ala Ala Asp Tyr Gly Leu Ser Arg Phe Aso His 
205 210 215 



62 6 



67 4 



7 7 0 



ATC GTC GGC AAC GTG CCG GAG CTG GCG CCC GCC GCC GCC TAC TTC GCC 

lie Val Gly Asn Val Pro Glu Leu Ala Pro Ala Ala Aia Tyr Phe Ala 

220 225 230 

GGC TTC ACG GGG TTC CAC GAG TTC GCC GAG TTC ACG ACG GAG GAC CTG 

uly Phe Thr Gly Phe His Glu Phe Aia Glu Phe Thr Thr Glu Asp Val 

23 > 240 245 * 250 



962 



1010 
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HOG 



1250 



GGC ACC GCG GAG AGC GGC CTC AAC TCC ATG GTG CTC GCC AAC T^G lU.b 

'J'.v r 'h- A- a Glu Ser Giv Leu Asn Ser Met Val Leu Ala Asn Asn Ser 
* ^ Y *' " 255 ' 260 265 

GAG AAC GTG CTG CTC CCG CTC AAC GAG CCG GTG CAC GGC ACC AAG CGC 
Glu Asn Val Leu Leu Pro Leu Asn Glu Pre Veil His Cly Tr.r Lys Arq 
27 0 27 5 2 80 • 

CGC AGC CAG ATA CAA ACG TTC CTG GAC CAC CAC GGC GGC CCC GGC GTG 
Arq Ser Gin lie Gin Thr Phe Leu Asp His His Gly Gly Pro Gly Max 
235 290 295 

r-AG CAC ATG GCG CTG GCC AGC GAC GAC GTG CTC AGG ACG CTG AGG GAG 
-In His Met Ala Leu Ala Ser Asp Asp Val Leu Arq Thr Leu Arg Glu 
300 305 310 

£^G CAG GCG CGC TCG GCC ATG GGC GGC TTC GAG TTC ATG GCG CCT CCC 
Met Gin Ala Arq Ser Ala Met Gly Gly Phe- Glu Phe Met Ala Pro Pro 
315 ' 320 32 5 3-iG 

xrp TCC GAC TAC TAT GAC GGC GTG AGG CGG CGC GCC GGG GAC GTG CTC 12 9S 
Thr Ser Asp Tyr Tvr Asc Gly Val Arq Arc Arq Ala Gly Asp Val Leu 
3.35 ' 34 0 34 5 

:rr. GAA CCA CAG ATT AAG GAG CGC CAG GAG CTA GGG GTG CTG GTG GAC i: ; -4r.. 
:--r Glu Ala Gin II- Lvs Glu Cvs Gin Glu Leu Gly Val Leu Val Asp 
350 355 360 

AGG GAT GAC CAG GGC GTG CTC CTC CAA ATC TTC ACC AAG CCA GTG GGG 139 4 
\rq Asp Asp Gin Glv Val Leu Leu Gin lie Phe Thr Lys Pro va: Gly 
365 370 375 

GAC AGG CCA ACG CTG TTC TTG GAA ATC ATC CAA AGG ATC GGG TGC ATG 14 4 2 
Asp Ara Pro Thr Leu Phe Leu Glu lie lie Gin Arg lie Gly Cys Met 
H 380 385 390 

GAG AAG GAT GAG AAG GGG CAA GAA TAC CAA AAG GGT GGC TGC GGC GGG 14 90 
Glu Lys As d Glu Lys Gly Gin Glu Tyr Gin Lys Gly Giy Cys uly -iy 
395 ,400 405 410 

TTC GGC AAG GGA AAC TTC TCG CAG CTG TTC AAG TCC ATC GAG GAT TAT 15 38 
: 1ie Giv Lys Glv Asn Phe Ser; Gin Leu Phe Lys Ger lie Glu Asp Tyr 
,11$ 4 20 4 2 5 

■"=AG AAG TCC CTT GAA GCC AAG CAA GCT GCT GCA GCA GCT GCA CCT CAG 158 6 
Glu Lys Ser Leu Glu Ala Lys Gin Ala Ala Ala Ala Ala Ala Ala Gin 
430 435 440 

GGA TCC TAG GACAGTGCTT G GAG AC GAG C AACTGCTGTG GCACTTTGTA 163 5 
Gly Ser 

TCATGGAACA GAAATAATGA AGCGTGTTCT TTGTGACACT TGACATGCAA ATGTTTGTGT 1695 

TCTGTAACCG TTGAATATAT GGGACGATGC TATGATGGTG TAATAGATGG TAGAGAGGGT 17 55 

ACAACCCTGA T 1766 
(2) INFORMATION FOR SZQ 10 NO : I 1 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 45 amino acids 
(E) TYPE: amino acid 
(DJ TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: II: 

Met Pro Pro Thr Pro Thr Ala Aia Aia Ala Giv Ala Ala Vai Ala A 1 a 
1 ^ 10 15 

Ala Ser Ala Ala Glu Gin Ala A La Phe Arg Leu Vai Giv His Ara A*n 
20 25 ' 30 

Phe Vai Arg Phe Asn Pro Arq Ser Asd Arq Phe His Thr Leu Ala Phe 
35 40 45 

His His Vai Glu Leu Trp Cys Ala Asd Ala Ala Ser Ala Ala Glw Ara 
50 55 60 

Phe Ser Phe Gly Leu Gly Ala Pro Leu Aia Aia Aro Ser Asd Leu 
b 5 7Q 



" 80 



Thr Giy Asn Ser Ala His Ala Ser Leu Lou Leu Arc: Ser c\ v Ser Leu 
85 9Q ~ 95 

Ser Phe Leu Phe Thr Ala Pro Tyr Ala His Gly Ala Asn Aia A3 a ^hr 
100 1Q5 * 110 

Ala Ala Leu Pro Ser Phe Ser Al.a Ala Aia Aia Arc Am p-- Aia Al 3 
115 120 125 

Asp His Giy Leu Ala Vai Arq Aia Vai Ala Leu Aro VY;1 A 1 a A<=-> Al * 
130 135 140 

Glu Asp Ala Phe Arg Ala Ser Vai Ai.i Ala Giv Ala Ara P-o Aia Phe 
145 150 155 ■ 160 

Gly Pro Vai Asp Leu Gly Arg Gly Phe Arg Leu Aia Giu Vai Glu Leu 
165 170 175 

Tyr Gly Asp Vai Vai Leu Arg Tyr Vai Ser Tyr Pro Asd Giv Ala Ala 
18 0 18 5 i c,q 

Gly Giu Pro Phe Leu Pro Giy Phe Glu Gly Vai Ala Ser Pro GW Aia 
195 200 205 

Ala Asp Tyr Gly Leu Ser Arc Phe Asd His He Vai Giv A<=n Vai * t -o 
210 215 220 

Glu Leu Aia Pro Aia Ala Ala Tyr Phe Ala Gly Phe Thr Giv Phe His 
225 230 235 " 240 

Glu Phe Ala Glu Phe Thr Thr Glu Asp Vai Giv Thr Ala Glu S^r Giv 
245 250 255 

Leu Asn Ser Met Vai Leu Ala Asn Asn Ser Glu Asn Vai Leu Leu Pro 
260 265 270 

Leu Asn Glu Pro Vai His Giy Thr Lvs Arg Arc Ser Gin ^ e Gin Thr 

280 285 

Phe Leu Asp His His Gly Gly Pro Gly Vai Gin His Met Ala Leu Ala 
290 295 300 

Ser Asp Asp Vai Leu Arg Thr Leu Arg Glu Met Gin Ala A-a Ser Ala 
305 310 315 ' 320 

Met Gly Gly Phe Glu Phe Met Ala Pro Pro Thr Ser Asp Tv- T"r Asn 
325 330 * 335 
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Gly Val Ara Arq Arq Ala Gly Asp Val Leu Thr Glu Ala Gin lie Lys 
340 3-15 350 

Glu Cys Gin Glu Leu Gly Val Leu Val Asp Arg Asp Asp Gin Gly Val 
355 * 360 365 

Leu Leu Gin lie Phe Thr Lys Pro Val Gly Asp Arq Pro Thr Leu Phe 
370 375 330 

Leu Glu He He Gin Arq He Gly Cys Met Glu Lys Asp Glu Lys Gly 
385 390 395 400 

Gin Glu Tyr Gin Lvs Gly Gly Cys Gly Gly Phe Gly Lys Gly Asn Phe 
405 410 415 

Ser Gin Leu Phe Lys Ser lie Giu Asp Tyr Giu Lys Ser Leu Glu Aia 
420 425 430 

Lys Gin Ala Ala Ala Aia Aia Ala Aia Gin Gly Ser 
435 440 

{2} 1 N FORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTIC?: : 

(A) LENGTH: 1356 base pair:-; 
( B ) TYPE: ruci e l a c i d 

(C) STRANDEDNE5S : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(ill) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis thaiiana 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 12 54 

fix) FEATURE: 

(A) NAME/ KEY : misc_feature 

(B) LOCATION: 1..3 

(D) OTHER INFORMATION : /standard^ r,ame = 

"translation initiation 
codor." 

(ix) FEATURE: 

(A) NAME/KEY: misc_feature 
{BJ LOCATION: 1252.. 1254 

iD) OTHER INFORMATION: - ; st andard_name = 

"translation termination 
co don" 

(xii SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

ATG TCC AAG TTC GTA AGA AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT 4 8 

Met Ser Lvs Phe Val Arg Lys Asn Pro Lys Ser Asp Lys Phe Lys Val 
15 10 15 

AAG CGC TTC CAT CAC ATC GAG TTC TGG TGC GGC GAC GCA ACC AAC GTC 9 6 

Lys A**g Phe" His His lie Giu Phe Trp Cys Gly Asp Ala Thr Asn Val 
20 25 50 
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T CGT CGC TTC TCC TGG GGT CTG GGG ATG AG A TTC TCC GCC AAA. T~- 
.--la .nrg Arg Phe Ser Trp Gly Leu Gly Met Arc Phe Ser Ala Lys S*r 
35 40 45 

GAT CTT TCC ACC GGA AAC ATG GTT CAC GCC TCT TAG CTA CTC ACC T C r 
Asp .eu Ser Thr Giy Asn Met Val His Ala Ser Tvr Leu Leu Thr *er 

GGT GAC CTC CGA TTC CTT TTC ACT GCT CCT TAG TCT CCG TCT r* jr Tc ~ 
-*±y Asp Leu Arg Phe Leu Phe Thr Ala Pro Tvr Ser "ro Ser L^u Ser 
6 ° 70 75 " so 

GCC GGA GAG ATT AAA CCG AC A ACC ACA GCT TCT ATC CCh AGT T^r GAT 
Ala biy Giu He Lys Pro Thr Thr Thr Ala Ser lie Pro Gor P'ne As- 
B5 90 p 5 ' H 

CAC GGC TCT TGT CGT TCC TTC TTC TCT TCA CAT GGT r T n GG ~ CT ~ 
His Gly Ser Cys Arg Ser Phe Phe Ser Ser His Gl v Leu Giy Val A-n 
100 1Q5 J " 3 



110 



GCC GTT GCG ATT GAA GTA GAA GAC GCA GAG TCA GCT TTC TCC ATC AGT 

Ala Val Ala xle Glu Val Glu Asp Ala Glu Ser Ala Phe Ser lie Ser 

113 120 125 

GTA GCT AAT CGC GCT ATT CCT TCC TCC CCT CCT ATC G T " C*rc AV 7 CP • 

v'al Ala Asn Gly Ala He Pro Ser Ser Pro Pro Tir- Val Leu fi'sr, r'l^ 

130 I35 — ° 

GCA GTT ACG ATC GCT GAG GTT AAA CTA TAG OGr g*t GTT GTT — * 

Ala Vai Thr He Ala Glu Val Lys Leu Tvr Glv Asp Vai Val L-u 

^ 5 150 155 i 60 

TAT GTT AGT TAC AAA GCA GAA GAT ACC GAA AAA TCC GAA TTC tt G rr P < 

iyr Val ^er Tyr Lys Ala Glu Asp Thr Glu Lys Ser Glu Phe Leu Pro 

165 170 175 

GGG TTC GAG CGT GTA GAG GAT GCG TCG TCG T~C C~A T^G GAT m ^ 

Gly Phe Glu Arg Val Glu Asp Aia Ser Ser Phe Pro Leu Aso -vV.l Glv 

130 185 19 ' 0 > 



ATC 



144 



192 



140 



2S6 



336 



3 8 A 



1 3 2 



; 3 0 



528 



57G 



^24 



~ GG CTT GA C CAC GCC GTG GGA AAC GTT CCT GAG ^TT (~:r,v Q C r- 
ire :-rq Arq Leu Asp His Ala Vai Gly Asn Val Pro Glu Leu G~v ^ 
195 200 205 *" 

GCT TTA ACT TAT GTA GCG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG 
Ala ueu Thr Tyr Val Ala Gly Phe Thr Giy Phe His Gin Phe Aia Glu 
ii0 215 220 

TTC ACA GCA GAC GAC GTT GGA ACC GCC GAG AGC GGT T^A AA'^ T <"a GC- 
Phe Thr Ala Asp Asp Vai Giy Thr Ala Giu Sor Giy Leu Asn Ser Ma 

230 235 240 

GTC CTG GCT AGC AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CC* 
yal Leu Aia Ser Asn ^sp Glu Met Val Leu Leu Pro lie Asn G 1 Pro 
245 250 255 

GTG CAC GGA ACA AAG AGG AAG AGT CAG ATT CAG ACG TAT TTG GAA CAT 816 
Val mis Gly Thr Lys Arq Lys Ser Gin lie Gin Thr Tyr Leu Giu His 
26 0 265 270 

AAC GAA GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA G A " a T A 
Asn Glu Gly Ala Gly Leu Gin His Leu Aia Leu Met' Ser Glu Aso Tie 
275 280 285 



?20 



7 66 
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TTC AGG ACC CTG AG A GAG ATG AGG AAG AGG AGC AGT ATT GGA GGA TTC ?12 

Phe Arg Thr Leu Arg Glu Met Arg Lys Arg Ser Ser lie Gly Gly Phe 
290 295 300 

GAC TTC ATG CCT TCT CCT CCG CCT ACT TAC TAG CAG AAT CTC AAG AAA 3 60 

Asp fhe Met Pro Ser Pro Pro Pro Thr Tyr Tyr Gin Asr. Leu Lys Lys 
3*05 * 310 31b 320 

CGG GTC GGC GAC GTG CTC AGC GAT GAT CAG ATC AAG GAG TGT GAG GAA 1CC8 

Ara Val Gly Asp Val Leu Ser Asd Asp Gin lie Lys Glu Cys Glu Glu 
325 330 335 

TTA GGG ATT CTT GTA GAC AG A GAT GAT CAA GGG ACG TTG CTT CAA ATC i056 

L«u G»y 11*=- Leu Val Asd Ara Asd Asp Gin Gly Thr Leu Leu Gin lie 

34C 345 350 

TTC AC A AAA CCA C7A GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC HOs 

Phe Thr Lys Pro Leu Gly Asd Acq Pro Thr lie Phe lie Glu He He 
355 360 365 

CAG AGA GTA GGA TGC ATG ATG AAA GAT GAG GAA GGG AAG GCT TAC CAG 1152 

Gin Arq Val Glv Cys Met Met Lys Asd Ciu Glu Gly Lys Ala Tyr Gin 
^ * 370 ' 375 ' 380 

AGT GGA GGA TGT GGT GGT TTT GGC AAA GGC AAT TTC TCT GAG CTC TTC 1200 

Ser Glv Glv Cys Glv Gly Phe Gly Lys Gly Asn Phe Ser Glu Leu Phe 

* _ _ _ -.i-NI- 



3 90 3 55 400 



AAG TCC ATT GAA GAA TAC GAA AAG ACT CTT GAA GCC AAA CAG TTA GTG H4 5 
Ly<? Ge- He Glu Glu Tvr Glu Lys Thr Leu GLu Ala Lys Gin Leu Veil 
405 4 10 415 

GGA TGA A C AAG AAG AA GAACCAACTA AAG G AT T G T G TAATTAATGT AAAACTGTTT 1304 
Gly * 

TATCTTATCA AAACAATGTA T AC AAC AT C T CATTTAAAAA C G AG AT C AA T CC 1256 
[2) INFORMATION FOR SEQ ID NO : 1 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 18 amino acids 

(B) TYPE: amino acid 
(Dl TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi: SEQUENCE DESCRIPTION: SEQ ID NO : i 3 : 

Met Ser Lvs Phe Val Arg Lys Asn Pro Lys Ser Asp Lys Phe Lys Va! 
1 ' 5 10 15 

Lys Ara Phe His His He Glu Phe Trp Cys Gly Asp Ala Thr Asn Val 
20 25 30 

Ala Arg Arq Phe Ser Tro Gly Leu Gly Met Arg Phe Ser Ala Lys Ser 
35 " 40 45 

Asp Leu Ser Thr Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser 
50 55 60 

Gly Asp Lou Arg Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser 
65 ' 7 0 7 $ 80 

Ala Glv Glu lie Lys Pre Thr Thr Thr Ala Ser He Pro Ser Phe Asp 
85 90 95 
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His Gly Ser Cys Arq Sor Phe Phe Ser Ser His Glv Leu Glv Va * rQ 
100 105 " no * " q 

Ala Val Ala He Giu Val Glu Asp Ala Glu Ser Ala Phe Ser t • „ 
115 120 ' "~ 

val Ala Asn Gly Ala lie Pro Ser Ser Pro Pro He Val Leu Asr. Glu 
JL 140 

Ala Val Thr lie Ala Glu Val Lys ' eu T»r Gl- Asn v — i r 
145 . c ,. y -cu 1 ■> r MS P Va -^ val Leu Arq 

150 155 :6 g 

Tyr Val Ser Tyr Lys Ala Giu Asp Thr Glu Lys Scr Giu Phe Leu Pro 

165 170 175 

Gly Phe Glu Arq Val Glu Asp Ala Ser Sor Phe Pro Leu Asp G1> . 

180 ;es 19 o J 

He Arg Arg Leu Asp ills Ala Val Gly Asn Prc Giu Leu Glv P r . 

200 20 5 - *- - 

Ala Leu Thr Tyr Val Ala Gly Phe Thr Giy Phe His Gln Phe Ala Glu 



u 

25 ' 035 " J ' ""^ Uiy A " n Aia 



Phe Thr Ala Asp Asp Val GJ y Thr Ai. ZLu Ser Civ :,ou A 

35 



P.A0 

Val Leu Ala Ser Asn Asp Glu Met Val Leu Leu Pro lie A 3n Giu -o 

" 250 255 

Val His Gly Thr Lys Arg Lys Ser Gin Ii„ Gin Thr Tvr Leu Gi- Hi* 

Asn Glu Gly Ala Gly Leu Gin His Leu Ala Leu Met Ser G'u Asc He 
27 & 280 285 

Phe Arg Thr Leu Arg Glu Met Arg Lys Arq Ser Ser Ii e Glv Glv Ph. 

2*5 300 

305 |£ Pr ° ^r Tyr Tyr Gin A,n Leu Lys Lys 

315 320 
Arg Val Gly Asp Val Leu Scr Asp Asp G I n lie 



325 



530 



s Giu Cys Giu Giu 
3 35 

Leu Gly He Leu Val Asp Arq Asp Asp Gin Gly Thr Lou Leu G' r r Je 
3 "° 345 3 50 • 

Phe Thr Lys Prc Leu Gly Asp Arg Pro Thr lie Phe lie Glu H- i lo 
355 360 365 ilC 

Gin Arg Val Giy Cys Met Met Lys Asp Glu Glu Giy Lys A J a Tyr Gin 



380 



Ser Gly Gly Cys Gly Gly Phe Gly Lys Gly Asn Phe Ser Giu Leu Phe 

390 3 95 400 

Lys Ser He Glu Glu Tyr Glu Lys Thr Leu Glu Ala Lvs Gin Leu Val 

410 415 

Gly 
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(2) INFORMATION FOR SEQ ID NO : 1 4 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1443 base pairs 
(B- TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE : cDNA to mRNA 

(iiii HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis chaiiar.a 

( ix ) FEATURE: 

(A) NAME /KEY : CDS 

(BJ LOCATION: 9. . 134 6 

fix) FEATURE: 

{Ai NAME/ KEY : misc_feature 
{ B ) LOCATION: 9. . 11 

(D) OTHER INFORMATION: /st andard_name= 

"translation initiation 
cod on " 

(ixj FEATURE: 

(A) NAME /KEY : mi sc_tea cure 
(3) LOCATION: 134 A.. 134 6 

( 0 ) OTHER INFORMATION: / s z a nda rd_name = 

"translation termination 
codon" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

TGAAATCA ATG GGC CAC CAA AAC GCC GCC GTT TCA GAG AAT CAA AAC CAT 50 
Met Glv His Gin Asn Ala Ala Val Ser Glu Asn Gin Asn His 
15 10 

GAT CAC GGC GCT GCG TCG TCG CCG GGA TTC AAG CTC GTC GGA TTT TCC 98 
Asp Asp Gly Ala Ala Ser Ser Pro Glv Phe Lvs Leu Val Gly Phe Ser 
15 20 2z 30 

AAG TTC GTA AGA AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT AAG CGC 14 b 

Lys Phe Val Arc Lys Asn Pro Lys Ser Aso Lys Phe Lys Val Lys Arg 
35 40 ' 45 

TTC CAT CAC ATC GAG TTC TGG TGC GGC GAC GCA ACC AAC GTC GCT CGT 194 
Phe His His He Glu Phe Trp Cys Gly Aso Ala Thr Asn Val Ala Arg 
50 55 60 

CGC TTC TCC TGG GGT CTG GGG ATG AGA TTC TCC GCC AAA TCC GAT CTT 24 2 

Arg Phe Ser Trp Gly Leu Gly Met Arg Phe Ser Ala Lys Ser Asp Leu 
65 70 75 

TCC ACC GGA AAC ATG GTT CAC GCC TCT TAC CTA CTC ACC TCC GGT GAC 2 90 

Ser Thr Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser Gly Asp 
80 85 90 

CTC CGA TTC CTT TTC ACT GCT CCT TAC TCT CCG TCT CTC TCC GCC GGA 338 
Leu Arg Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser Ala Gly 
95 100 105 110 

GAG ATT AAA CCG ACA ACC AC A GCT TCT ATC CCA AGT TTC GAT CAC GGC 38 6 

Glu lie Lys Pro Thr Thr- Thr Ala Ser lie Pro Ser Phe Aso His Gly 
115 120 * 125 
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TCT TGT CGT TCC TTC TTC TCT TCA CAT GGT CTC CGT GTT AG A GCC GTT 4 3-; 

Ser Cys Arg Ser Phe Phe Ser Ser His Gly Leu Gly Val. Arq Ala Val 

130 135 14 0 

GCG ATT GAA GTA GAA GAC GCA GAG TCA GCT TTC TCC ATC AG T GTA GCT 4S2 

Ala lie Glu Val Glu Asp Ala Giu Ser Ala Phe Ser lie Ser Val Ala 

14 5 150 155 

AAT GGC GCT ATT CCT TCG TCG CCT CCT ATC GTC CTC AAT GAA GCA GTT 5 30 

Asn Gly Ala lie Pro Ser Ser Pro Pre lie Val Leu Asp. Glu Ala Val 

160 165 170 

ACG ATC CCT GAG GTT AAA CTA TAC GGC G AT GTT GTT CTC CGA TAT GTT 57 8 

Thr He Ala Glu Val Lys Leu Tyr Gly Asp Val Vol Lou Arq Tyr Val 

175 180 1S5 L90 

AGT TAC AAA GCA GAA GAT ACC GAA AAA TCC GAA TTC TTG CCA GGG TTC 62 6 

Ser . Tyr Lys Ala Glu Asp Thr Glu Lyr. Ser Glu Phe Leu Pro Civ Phe 

195 200 205 

GAG CGT GTA GAG GAT GCG TCG TCG TTC CCA TTG GAT TAT CGT ATC CGG 6 7 4 

Glu Arq Val Glu Asp Ala Sc-r Ser Phe Pro Leu Asp Tvr Glv lie- Arq 

210 215 ' 220 

CGG CTT GAC CAC GCC GTG GGA AAC GTT CCT GAG CTT GGT CCG GCT TTA 2 2 

Arq Leu Asp His Ala Val Gly Asn Val Pro Giu Leu Gly Pro Aid Leu 

225 230 235 

ACT TAT GTA GCG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG TTC ACA 77 0 

Thr Tyr Val Ala Gly Phe Thr Glv Phe His Gin Phe Ala Glu Phe Thr 

240 245 " 250 

GCA GAC GAC GTT GGA ACC GCC GAG AGC GGT TTA AAT TCA GCG GTC CTG 8 18 

Ala Asp Asp Val Gly Thr Ala Glu Ser Gly Leu Asn Ser Ala Vai Leu 

255 260 265 270 

GCT AGC AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CCA GTG CAC 8 66 

Ala Ser Asn Asp Glu Met Val Leu Leu Pro He Asn Giu Pro Val ilis 

275 2S0 2S5 

GGA ACA AAG AGG A AG AGT CAG ATT CAG ACG TAT TTG GAA CAT AAC GAA 1 4 

Gly Thr Lys Arq Lys Ser Gin lie GJ n Thr Tyi Ll-u Giu His Asn Glu 

290 295 " 300 

GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA GAC ATA TTC AGG 962 

Gly Aia Gly Leu Gin His Leu A. la Leu Met Scr Giu Asp He Phe Ara 

305 310 315 

ACC CTG AGA GAG ATG AGG AAG AGG AGC AGT ATT GGA GGA TTC GAC TTC 1010 

Thr Leu Arq Glu Met Arg Lys Arg Ser Ser He Glv Gly Phe Asp Phe 

320 " 325 330 

ATG CCT TCT CCT CCG CCT ACT TAC TAC CAG AAT CTC AAG AAA CGG GTC 10 58 

Met Pro Ser Pro Pro Pro Thr Tyr Tyr Gin Asn Leu Lys Lys Ara Val 

335 340 345 ' 350 

GGC GAC GTG CTC AGC GAT GAT CAG ATC AAG GAG TGT GAG GAA TTA GGG 1106 

Gly Asp Val Leu Ser Asp Asp Gin He Lys Glu Cys Giu Giu Leu Gly 

355 360 365 

ATT CTT GTA GAC AGA GAT GAT CAA GGG ACG TTG CTT CAA ATC TTC ACA 1154 

He Leu Val Asp Arg Asp Asp Gin Gly Thr Leu ueu Gin Tie Phe Thr 

370 " 375 330 
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AAA CCA CTA GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC CAG AG A 1202 
Lvs Pro Leu Gly Asd Arq Pro Thr lie Pre II* Glu He He Gin Arq 
335 * 390 395 

GTA GGA TGC ATG ATG AAA GAT GAG GAA GGG AAG GCT TAG CAG AG 7 GGA 12 SO 
Val Gly Cvs Met Met Lys Asp Glu Glu Gly Lys Ala Tyr Gin Ser Gly 
400 ^ 4 0*5 410 

GGA TGT GGT GGT TTT GGC AAA GGC AAT T7C 7CT GAG CTC T7C AAG 7CC 12 93 
Gly Cvs Gly Gly Phe Gly Lys Gly Asn Phe 5err Glu Leu Phe Lys Ser 
415 ' 420 ^3 "0 

VTT GAA GAA TAG GAA AAG ACT CTT GAA GCC AAA CAG TTA GTG GGA TGA 134 6 
lie Glu Glu Tvr Glu Lvs Thr Leu Glu Aid Lys Gin Leu Val oly 
435 " 4 40 44 5 

AC AAG AAG AA GAACCAACTA AAGGATTGTG T AAT T AAT G T AAAACTG7TT TATCTTATCA 14 0£> 

AAAC AA TGT A TACAACATCT C AT T T .AAA A A CGAGATCAAT CC 1448 

(2) INFORMATION FOR SEQ ID NO : 1 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 6 amino acids 
( b ) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 5 : 

Met Gly His Gin . Asn Ala Ala Val Ser Glu Asr. Gin Asn His Asp Asp 
i 5 10 15 

Gly Ala Ala Ser Ser Pro Gly Phe Lys Lou Val Gly Phe Ser Lys Phe 



Val Arq Lys Asn Pro Lvs Ser Asp Lys Phe Lys Val Lys Arq Phe His 
• 35 40 43 



His Tie Glu Phe Trn Cys Glv Aso Ala Thr Asn Val Ala Arq Arq Phe 
50 55 60 

Ser T-o Gly Leu Glv Met Arq Phe Ser Ala Lys Ser Asp Leu Ser Thr 
65 " ' 75 80 

Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser Gly Asp Leu Arq 
85 ^0 95 

o he Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser Ala Gly Glu He 
100 105 HO 

Lys P-o Thr Thr Thr Ala Ser He Pro Ser Phe Asp His Gly Ser Cys 
115 120 125 

Arq Ser Phe Phe Ser Ser His Gly Leu Gly Val Arq Ala Val Ala lie 
130 " 135 1^0 

Glu Val Glu Asp Ala Glu Ser Ala Phe Ser He Ser Val Ala Asn Gly 
145. 150 155 160 

Ala He Pro Ser Scr Pro Pro He Val Leu Asn Glu Ala Val Thr He 
165 1 7 ° 175 

Ala Glu Val Lys Lou Tvr Gly Asp Val Val Leu Arq Tyr Val Ser Tyr 
180 J 135 190 
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Lys Ala Glu Asp Thr Glu Lys Ser Giu Phe Leu Pro GIv Phe Glu Arc 
iyD 200 205 

Val Glu Asp Ala Ser Ser Phe Pro Leu Asp Tyr Gly He Ara .Arg Leu 

215 220 

Asp His Aid Val Gly Asn Val Pro CP <■ 1 . • -. ro - . „. 

225 ,, n *-- J - i > • ro i-eu Tnr Tvr 

230 255 2 - Q 

Val Ala Gly Phe Thr Gly Phe His Gin Phe Ala Glu Phe Thr Ala Asd 
245 250 255 

Asp Val Gly Thr Ala Glu Ser Gly Leu Asn Ser Ala Val Leu Ala Ser 

265 27 0 

Asn Asp Glu Met Val Leu Leu Pro lie Asn Glu Pro Val His Gly Thr 
275 280 285 

Lys Arg Lys Ser Gin lie Gin Thr Tyr Leu Glu His Asn Glu Gly Ala 

295 3oo 

Gly Leu Gin His Leu Ala Leu Met Ser Glu Asp He Phe Arc Thr Leu 



Arg Glu Met Arg lys Ara Ser Ser lie Gly Civ Phe Asp Pn * Het Pro 
323 330 " 335 ^ 

Ser Pro Pro Pro Thr Tyr Tyr Gin Asn Leu Lys Lys Arg V*l Gly Aso 

Val Leu Ser Asp Asp Gin He Lys Glu Cys Glu Giu Leu Glv He Leu 

jdd 3 60 



3 65 



Val Asp Arg Asp Asp Gin Gly Thr Leu Leu Gin r le Phe Thr . ?ro 

375 380 

Leu Gly Asp Arg Pro Thr He Phe He Glu I He Gin Arg Val Gly 



4 00 



Cys Met Met Lys Asp Glu Glu Gly Lys Ala Tv. Gin Ser GW Gly r vs 

405 410 * 415 " 

Gly Gly Phe Gly Lys Gly Asn Phe Ser Glu Leu Phe Lys lie G". 

420 425 

Glu Tyr Glu Lys Thr Leu Glu Ala Lys Gin Leu Val Gly - 
435 440 445 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA to-mRNA 

(iii) HYPOTHETICAL: NO 

<vi) ORIGINAL SOURCE: 

(A} ORGANISM: Vernonia galamenensis 

fvii) IMMEDIATE SOURCE: 

(B) CLONE: vs 1 . pk00 1 5 . b2 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

CCACACCGAT TGCCGGAACT TCACCGCCTC TCACGGCCTT GCAGTCCGAG CAATCGCCAT 60 

TGAAGTCGAT GACGCCGAAT TAGCTTTCTC CGTCAGCGTC TCTCACGGCG CTAAACCCTC 120 

CGCTGCTCCT GTAACCCTTG GAAACAACGA CGTCGTATTG TCTGAAGTTA AGCTTTACGG 100 

CGATGTCGCT TTCCGGTACA TAAGTTACAA AAATCCGAAC TATACATCTT CCTTTTTGCC 24 0 

CGGGT7CGAG CCCGTTGAAA AGACG7CGTC GTTTTATGAC CTTGACTACG GTATCCGCCG 300 

TTTGGACCAC CCCGTAGGNA ACGTCCCTGA GCTTGCTTCG GCAGTGGACT ACGTGAAATC 360 

ATTCACCGGA TTCCATGAGT TCGCCGAATT CACCGCGGAG GACGTCGGGA CGAGCGAGAG 4 20 

GGAACTGAAT TCGGTCGTTT TAGCTTGCAA CAGTGAGATG GTCTTGATTC CGATGAACGA 4 8 0 

GCCGGTGTAC GGAANAAAAG GAAGNAGCCA GAT 513 
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CLAIMS 

1 - An isolated nucleic acid fragment encoding a plant p-hvdroxy- 
phenylpyruvatc dioxygenasc enzyme, the fragment eomprtsing a nucleotide 
sequence selected from the group consisting of 
5 nucleotide sequences encoding a polypeptide comprising the amino 

acid sequences set forth in SEQ ID NO:3 ? SEQ ID NO: II. SEQ ID 
NO: 1 3, and SEQ ID NO: 1 5 and 

modified nucleotide sequences essentially similar to the nucleotide 
sequences of SEQ ID NO:2. SEQ ID NO 1 0. SEQ ID NO: 1 2 and 
' 0 SEQ ID NO: 1 4 containing deletions, insertions, or substitutions in 

the sequence that do not affect the functional properties of the 
encoded protein. 

2. An isolated nucleic acid fragment encoding a plant />hvdroxvphenvl- 
pyruvate d.oxygenase enzyme, the fragment comprising a nucleotide sequence as 

15 set forth in SEQ ID NO. 14. 

3 . A chimeric gene comprising the nucleic acid fragment of Claims 1 or 
2 operably linked to at least one suitable regulatory sequence. 

4. The chimeric gene of Claim 3 wherein at least one suitable regulator 
sequence directs gene expression in a microorganism. 

20 5 " The tumeric gene of Claim 3 wherein the at least op citable 

regulatory sequence directs gene expression in a plant. 

6. A plasmid vector comprising the nucleic acid fragr >f Claims 1 or 
2 operably linked to at least one suitable regulatory sequencer 

7. A transformed host cell comprising a host cell an ilasmid vector 
25 of Claim 6. 

8. The transformed host cell of Claim 7 wherein 0: . , S t cell is derived 
from a plant or is a microorganism. 

9. The transformed host cell of Claim 8 wherein the microorganism is 
E. coli. 

0 1 0. A transformed plant tolerant to contact with at least one compound 

that mhibits the rate of the reaction of /;-hydroxypheny I pyruvate dioxvgenase 
enzyme in a non-transformed plant, the transformed plant comprising the chimeric 
gene of Claim 3 and a host plant. 

11. The transformed plant of Claim 10 wherein the host plant is a cereal 
5 crop plant. 

12. A method to identify a compound useful for its ability to inhibit the 
rate of the reaction ofp-hydroxyphcnylpyruvate dioxvgenase enzyme comprising: 

(a) transforming a host cell with the plasmid vector of Claim 6; 
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(b) facilitating expression of the nucleic acid fragment encoding the 
plant /?-hydroxyphenylpyruvate dioxygenase enzyme: 

(c) contacting the expressed enzyme from step (b) with a test 
compound: and 

5 (d) evaluating the capacity of the test compound to inhibit the rate of 

the reaction of p-hydroxyphenyipyruvate dioxygenase enzyme. 

13. The method of Claim 12 wherein evaluating the capacity of the test 
compound to inhibit the rate of the reaction of p-hydroxyphenyipyruvate 
dioxygenase enzyme is accomplished by measuring oxygen utilization, carbon 

1 0 dioxide release, homogentisatc production, loss of p-hydroxyphenylpyruvate or 
maleyiacetoacetate production. 

14. The method of Claim 12 wherein the transformed host cell is an 

E. coli that comprises a chimeric gene encoding a plant p-hydroxyphenylpyruvate 
dioxygenase enzyme. 
15 15. A compound that inhibits the activity of a plant p-hydroxyphenyl- 

pyruvate dioxygenase enzyme, the compound identified by the method ot 
Claim 14. 

1 6. A method for imparting tolerance to a plant to at least one compound 
that inhibits the rate of reaction of p-hydroxyphenylpyruvate dioxygenase enzyme 

20 comprising: 

(a) transforming a host plant cell with a chimeric gene comprising a 
nucleic acid fragment encoding plant p-hydroxyphenylpyruvate 
dioxygenase. and 

(b) expressing the chimeric gene in an amount effective to render 
25 the transformed plant substantially tolerant to the at least one 

compound that inhibits the rate of reaction of p-hydroxyphenyl- 
pyruvate dioxygenase. 

17. A method for the microbial production of active plant p-hydroxy- 
phenylpyruvate dioxygenase enzyme comprising: 

30 (a) stably transforming a microorganism with the chimeric gene of 

Claim 4 encoding the plant p-hydroxyphenyipyruvate 
dioxygenase; 

(b) facilitating expression by the chimeric gene for a suitable period: 
and 

35 ( C ) recovering active plant p-hydroxyphenylpyruvate dioxygenase 

enzyme. 

1 8. A method to overexpress p-hydroxyphenylpyruvate dioxygenase 
enzyme in a plant comprising: 
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(a) stably transforming a host plant cell with a chimeric DNA 
molecule comprising at least one copy of a suitable promoter to 
drive expression of an associated coding sequence in a plant cell 
operably linked to at least one copy of a homologous or 

5 heterologous coding sequence encoding /?-hydroxyphenvl- 

pyruvate dioxygenase: and 

(b) growing the transformed host plant cell of step (a). 

19. The method of Claim 18 wherein the chimeric DNA molecule is the 
chimeric gene of Claim 5. 

10 20 - An isolated nucleic acid fragment comprising a member selected from 

the group consisting of: 

(a) an isolated nucleic acid fragment as set forth in SEQ ID NO: ! 6: 

(b) an isolated nucleic acid fragment that is essentially similar to an 
isolated nucleic acid fragment as set forth in SEQ ID NO: 16; 

15 and 

(c) an isolated nucleic acid fragment that is complementary to (a) or 
(b). 
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1/6 

FIG.1 



1 CAAGAAACGNGTCGNCGACGTGCTCAGCGATGATCAGATCAAGGAGTGTGAGGAATTAGG 

61 GATTCTTNTAGACAGAGATGATCAAGGGACGTTNCTTCAAATCTNCACAAAACCACTAGG 

121 TGACAGGCCGACGNTATTTATAGAGATAATCCAGAGNGTAGGATGCATGATGAAAGATGT 

181 GGAAGGGANGGCTTACCAGAGTGGAGNATNTNGTGGTTTTGGCAAAGGCAATT 
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FIG. 2 



1 TGAAATCAATGGGCCACCAAAACGCCGCCGTTTCAGAGAATCAAAACCATGATGACGGCG 

6 1 C TGCGTCGTCGCCGGGATTCAAGCTCGTCGGATTTTCCAAGTTCGTAAGAAAGAATCCAA 

121 AGTCTGATAAATTCAAGGTTAAGCGCTTCCATC ACATCGAGTTCTGGTGCGGGGACGCAA 

£co47III 

181 CCAACGTCGCTCGTCGCTTCTCCTGGGGTCTGGGGATGAGATTCTCCGCCAAATCCGATC 

241 TTTCCACCGGAAACATGGTTCACGCCTCTTACCTACTCACCTCCGGTGAACTCCGATTCC 

301 TTTTCACTGCTCCTTACTCTCCGTCTCTCTCCGGCGGAGAGATTAAACCGACAACCACAG 

361 GTTCTATCCCAAGTTTCGATCACGGGTCTTGTCGGTCCTTCTTCTCTTCACATGGTCTCG 

421 GTGTTAGACCCGTTGCGATTGAAGTAGAAGACGCGGAGTCAGCTTTCTCCATCAGTGTAG 

4 81 CTAATGGCGCTATTCCTTCGTCGCCTCCTATCGTCCTCAATGAAGCAGTTACGATCGCTG 

541 AGGTTAAACTATACGGCGATGTTGTTCTCCGATATGTTAGTTACAAAGCAGAAGATACCG 

601 AAAAATCCGAATTCTTGCCAGGGTTCGAGCGTGTAGAGGATGCGTCGTCGTTCCCATTGG 

6 6 1 ATTATGGTATCCGGCGGCTTGACCACGCCGTGGGAAACGTTCCTGAGCTTGGTCCGGCTT 

721 TAACTT ATGTAGCGGGGTTCACTGGTTTTCACCAATTCGCAGAGTTCACAGCAGACGACG 



841 
901 
961 
1021 



TTCTACCGATTAACGAGCCAGTGCACGGAACAAAGAGGAAGAGTCAGATTCAGACGTATT 
TGGAACATAACGAAGGCGCAGGGCTACAACATCTGGCTCTGATGAGTGAAGACATATTCA 
GGACCCTGAGAGAGATGAGGAAGAGGAGCAGTATTGGAGGATTCGACTTCATGCCTTCTC 
CTCCGCCTACTTACTACCAGAATCTCAAGAAACGGGTCGGCGACGTGCTCAGCGATGATC 
1081 AGATCAAGGAGTGTGAGGAATTAGGGATTCTTGTAGACAGAGATGATCAAGGGACGTTGC 
1141 TTCAAATCTTCACAAAACCACTAGGTGACAGGCCGACGATATTTATAGAGATAATCCAGA 
GAGTAGGATGCATGATGAAAGATGAGGAAGGGAAGGCTTACCAGAGTGGAGGATGTGGTG 
GTTTTGCCAAAGGCAATTTCTCTGAGCTCTTCAAGTCCATTGAAGAATACGAAAAGACTC 
1321 TTGAAGCC AAAC AGTTAGTGGGATGAACAAGAAGAAGAACCAACTAAAGGATTGTGT AAT 
1381 TAATGT AAAACTGTTTT ATCTTATC AAAACAATGT AT ACAACATCTCATTTAAAAACGAG 



1201 
1261 



1441 



ATCAATCC 
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FIG.3A 



rab LrJcr.o is MGHQNAAV3 ENONHDDGAA 55 FC FKLVG F 5 K~V?.KM P K5 CKFK7KRFHK 

r :z-r, MPPTP7AAAA GAAVAAA3AA EQAAFRLVGH "RN FVR FN PP.S C R FH T LA FHH 

^■J r - YWDKGPK? E?.GR FLH FHS 

M^'-ioe M TTYMNKGPKF ERGRFLHFH3 

Human M TTV5DKGAK? ERGIrFLH FHS 

?i<5 M TSYSDKGEK? ERG?. FLH FHS 
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I EFWCGDATN VARRFSWGLG MRFSAKSDLS TGNMVHAS YL LTSGDLRFLF 

VELWCADAAS AAGRFSFGLG APLAARSDLS TGNSAHAS L L LRSGSLS FLF 

VTFWVGNAKQ AA3FYCNKMG FEPLAYKGLE TGSREWS H V IKQGKIVFVL 

VTFWVGNAKQ AAS FYCNKMG FEPLAYRGLE TGSREVVSHV I KRGKI VFVL 

VTFWVGNAKQ AAS FYCSKMG FEPLAYRGLE TGSREVVSHV IKQGKIVFVL 

VTFWVGNAKQ AAS YYCSKIG FEPLAYKGLE TGSREVVSHV VKQDKI VFVF 

101 150 
TAPYSPSLSA GEIKPTTTAS IPSFDHGSCR SFFSSHGLGV RAVAI EVEDA 

TAPYAHGADA ATAA LPSFSAAAAR RFAADHGLAV RAVALRVADA 

CSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 

CSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 

SSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 

SSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 



Arabidcpsis 
Corn 
Rat 
Mouse 
Human 
Pig 



Arabidopsis 
Corn 
Rat 
Mouse 
Human 
Pig 



Arabidopsis 
Corn 
Rat 
Mouse 
Human 
Pig 



151 

ESAFSISVAN 
EDAFRASVAA 
EH I VQKARER 
DH I VQKARER 
DYI VQKARER 
DY I VQKARER 



GAIPSSPPIV 
GARPAFGPVD 
GAKIVREPWV 
GAKIVREPWV 
GAKIMREPWV 
GAIIVREPWI 



LNEAVTIAEV 
LGRGFRLAEV 
EEDKFGKVKF 
EQDKFGKVKF 
EQDKFGKVKF 
EQDKFGKVKF 



KLYGDVVLRY 
ELYGDVVLRY 
AVLQTYGDTT 
AVLQTYGDTT 
AVLQTYGDTT 
AVLQTFGDTT 
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VSYKAEDTEK 
VSY. PDGAAG 
HTLVEKINYT 
HTLVEKI NYT 
HTLVEKMNYI 
HTLVEKMNYT 



Arabidopsis 
Corn 
Rat 
Mouse 
Human 



.VEDASSFP LDYGI RRLDH AVGNVP . . EL GPALTYVAGF 
. V . . AS PGA ADYGLSRFDH I VGNVP . . EL APAAAYFAGF 



201 250 
SEFLPGFER. 
EPFLPGFEG . 

GRFLPGFEAP TYKDTLLPKL PSCNLEIIDH I VGNQPDQEM ESASEWYLKN 

GRFLPGFEAP TYKDTLLPKL PRCNLEI I DH I VGNQPDQEM QSASEWYLKN 

GQFLPGYEPP AFMDPLLPKL PKCSLEMI DH I VGNQPDQEM VSASEWYLKN 



Pig GCFLPGFEAP TFTDPLLSKL PKCGLEI IDH I VGNQPDQEM ESASQWYMRN 



Arabidops is 
Corn 
Rat 
Mouse 
Human 
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TGFHQFAEFT ADDVGTAESG LNSAVLASND 

TGFHEFAEFT TEDVGTAESG LNSMVLANNS 

LQFHRFWSVD DTQVHTEYSS LRS I VVANYE 

LQFHRFWSVD DTQVHTEYSS LRS I VVTNYE 

LQFHRFWSVD DTQVHTEYSS LRS I VVANYE 



Pig LQFHRFWSVD DTQI HTEYSA LRSVVMANYE 



EMVLLPINEP 
ENVLLPLNEP 
ESIKMPINEP 
ESIKMPINEP 
ESIKMPINEP 
ESIKMPINEP 
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VHGTKRKSQI 
VHGTKRRSQI 
APG. RKKSQI 
APG . RKKSQI 
APG . KKKSQl 
APG . KKKSQl 
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FIG.3B 



301 

ops is QTYLEKNEGA GLQHLALMSE DIFRTLREMR KRSSIGGcDF MP5PPPTYYQ 

Corn QTFLDHHGGP GVQHMALASC DVLRTLREMQ ARSAMGGFEF MAFPTSDYYD 

Rat QEYVDYNGGA GVQHIALRTE DIITTIRHLR ER....GMEF LAVP.SSYYR 

Mouse QEYVDYNGGA GVQHIALKTE DIITAIRHLR ER . . . . GTE F LAAP.SSYYK 

Human QEYVDYNGGA GVQHIALKTE DIITAIRHLR ER....GLEF LSVP . STYYK 

Pig QEYVDYNGGA GVQHIALKTE DIITAIRSLR ER .... GVE F LA7P.FTYYK 



351 400 

o-sis NLKK..RVGD VLSDDQI KEC EELGI LVDRD DQGTLLQI FT KPLGDRPT I F 

Co-n GVRR..RAGD VLTEAQI KEC QELGVLVDRD DQGVLLQI FT KPVGDRPTLF 

-at- LLRENLKTSK IQVKENMDVL EELKI LVDYD EKGYLLQIFT KPMQDRPTLr 

Mous- LLRENLKSAK IQVKESMDVL EELHI LVDYD EKGYLLQI FT KPMQDRPTLF 
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TITLE 

PLANT GENE FOR />-HYDROXYPHENYLPYRUVATE DIOXYGENASE 

FIELD OF THE INVENTION 
This invention relates to the isolation and modification of nucleic acid 
5 encoding p-hydroxyphenylpyruvate dioxygenase enzyme from plants. These 
nucleic acid sequences were used to establish methods of identification of new 
herbicidal compounds that inhibit the activity of this enzyme, and to prepare new 
crop plants that are tolerant to the herbicidal action of inhibitors this enzyme. 
Chimeric genes comprising nucleic acid fragments containing all or part of the 

1 0 nucleic acid sequences encoding p-hydroxyphenylpyruvate dioxygenase may be 
used to produce active plant />hydroxyphenyIpyruvate dioxygenase enzyme in 
microorganisms, and to cause the production of modified forms of the enzyme in 
plants that may render such plants tolerant to inhibitors of the enzyme. 

BACKGROUND OF THE INVENTION 

15 Bleaching herbicides affect plant chloroplasts by decreasing their 

chlorophyll and carotenoid content. Several bleaching herbicides are known to 
inhibit the enzyme phytoene desaturase. resulting in the accumulation of phytoene 
in treated plants. However, compounds of the benzoyl cyclohexane-1 .3-dione 
type cause the accumulation of phytoene in plants but are not inhibitors of 

20 phytoene desaturase in vitro (Sandmann. G.. et al. (1990) Pcsiic. Sci. 30:353-355). 
Subsequent work revealed that these compounds are effective inhibitors of 
p-hydroxyphenylpyruvate dioxygenase (/?-hydroxypheny [pyruvate :oxygen 
oxidoreductase EC 1.13.1 1.27), a key enzyme in the biosynthesis of 
plastoquinones and tocopherols (Schulz, A., et al. (1993) FEBS Lett. 

25 318:1 62- 1 66). Based on the observation that phytoene desaturase requires a 
quinone as an electron acceptor, these authors postulated that by inhibiting 
/?-hydroxyphenylpyruvate dioxygenase, these herbicides act indirectly on 
phytoene desaturase by blocking the biosynthesis of quinones. 

The proposal that /?-hydroxyphcnylpyruvate dioxygenase is essential for 

30 carotenoid biosynthesis has received support from genetic studies in the plant 

model system Arahidopsis thaliana. Mutations in the pds I and pds2 genetic loci 
result in mutant plants that accumulate phytoene. However, genetic mapping of 
these mutant genes indicates that they do not correspond to the gene encoding the 
enzyme phytoene desaturase. The pdsl mutation can be rescued by homogentisic 

35 acid, the substrate of p-hydroxyphenylpyruvate dioxygenase. Therefore, this 
mutation corresponds to a defect in the activity of/>hydroxyphenylpyruvate 
dioxygenase (Norris, S. R., et al. (1995) Plant Cell 7:2139-2149). 
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In light of these disclosures, p-hydroxyphenylpyruvate dioxygenase is a 
promising new target for new herbicidal compounds. Research aimed at 
discovering new herbicides based on this mode of action would be greatly 
facilitated by the isolation of the plant gene encoding this enzyme and by the 
5 functional expression of this gene in transgenic organisms. For example, active 
enzyme produced in recombinant microorganisms could be used to establish 
screening methods for the identification of novel active compounds and to obtain 
structural and mechanistic information useful to guide further chemical synthesis. 
Furthermore, isolation of this gene would facilitate research aimed at generating 

10 mutant, herbicide-tolerant versions of the enzyme that may confer herbicide 
resistance to transgenic plants. 

A partial sequence of an Arabidopsis thaliana cDNA with homology to 
corresponding mammalian sequences encoding />hvdroxyphenylpyruvate 
dioxygenase has been identified (GenBank Accession No. T20952). but this 

1 5 truncated sequence is insufficient to identify an active plant />hydroxyphenyl- 
pyruvate dioxygenase. WO 96/38567 A2 addresses the utility that Would be 
attached to a DNA sequence of a /7-hydroxyphenyi pyruvate dioxygenase gene, but 
there is no biochemical evidence of function associated with the sequences 
disclosed. 

20 SUMMARY OF THE INVENTION 

This invention pertains to the isolation and characterization of nucleic acid 
fragments encoding plant p-hydroxyphenylpyruvate dioxygenase enzymes. More 
specifically, this invention pertains to isolated nucleic acid fragments encoding the 
/7-hydroxyphenylpyruvate dioxygenase enzymes from Arabidopsis thaliana and 

25 Zea mays. 

This invention also pertains to the production of active plant /7-hydroxy- 
phenylpyruvate dioxygenase enzyme in E. coli. In one embodiment, a chimeric 
gene comprising a nucleic acid fragment encoding a polypeptide that possesses 
p-hydroxyphenylpyruvate dioxygenase activity, operably linked to regulatory 

30 sequences that direct gene expression in E. coli, is claimed. In another 

embodiment, a plasmid vector comprising said chimeric gene is disclosed. In yet 
another embodiment, a transformed E. coli comprising a chimeric gene consisting 
of a nucleic acid fragment encoding a polypeptide that possesses /?-hydroxy- 
phenylpyruvate dioxygenase activity is disclosed. 

35 This invention also pertains to a method of identifying substances that 

inhibit the rate of the reaction of p-hydroxyphenylpyruvate dioxygenase enzyme. 
In one embodiment, the invention pertains to an assay for the detection of 
inhibitors of ^-hydroxyphenyipyruvate dioxygenase wherein a polypeptide 
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derived from a transformed E. call that displays />hydroxyphenylpyruvate 
dioxygenase activity is incubated in the presence of a test substance. Following 
incubation, /7-hydroxyphenylpyruvate dioxygenase enzymatic activity is measured 
wherein a reduction of enzymatic activity is indicative of the inhibitory capacity 
5 of the test substance. Enzymatic activity can be measured by any appropriate 
means, including but not limited to oxygen utilization, carbon dioxide release, 
homogentisate production, and loss of /?-hydroxyphenylpyruvate. Results are 
quantified by radiometric, colorimctric or chromatographic means. 

In another embodiment, this invention pertains to plants that are 

10 substantially tolerant to the application of at least one compound that inhibits the 
rate of the reaction of /?-hydroxyphcny (pyruvate dioxygenase. Plants may be 
rendered tolerant by overexpression of the wild-type p-hydroxyphenylpyruvate 
dioxygenase, by expression of a naturally-occuring resistant variant of this 
enzyme, or by expression of an altered form of /?-hydroxypheny I pyruvate 

I 5 dioxygenase that is resistant to the action of compounds that are inhibitory to the 
wild-type enzyme. 

A further embodiment of the invention is an isolated nucleic acid fragment 
comprising a member selected from the group consisting of: 

(a) an isolated nucleic acid fragment as set forth in SEQ ID NO: 16; 
20 (b) an isolated nucleic acid fragment that is essentially similar to an 

isolated nucleic acid fragment as set forth in SEQ ID NO: 16;. 
and 

(c) an isolated nucleic acid fragment that is complementary to (a) or 
(b). 

25 

BRIEF DESCRIPTION OF THE 
DRAWINGS AND SEQUENCE DESCRIPTIONS 
The invention can be more fully understood from the following detailed 
description and the accompanying drawings and the sequence descriptions which 
30 form a part of this application. 

Figure 1 presents a partial nucleic acid sequence of an expressed sequence 
tag (EST) bearing GenBank Accession No. T92052 obtained from an Arabidopsis 
thaliana cDNA library. This sequence was contained in clone 9 1 B 1 3T7 of the 
library. 

35 Figure 2 presents the nucleic acid sequence of the cloned cDNA encoding a 

full-length form of Arabidopsis thaliana />hydroxyphenylpyruvate dioxygenase 
enzyme, as it was initially determined (SEQ ID NO:2). Translation start and stop 
codons are underlined. Selected restriction sites are indicated. 
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Figure 3 presents the amino acid sequence comparison between full-length 
/>-hydroxyphenylpyruvate dioxygenases from Arabidopsis thaliana (SEQ ID 
NO: 1 5) and Zea mays (SEQ ID NO: 1 1 ) and the p-hydroxyphenylpyruvate 
dioxygenase enzymes derived from human (SEQ ID NO:6, GenBank Acc. 
5 No. U29895), pig (SEQ ID NO:7, GenBank Acc. No. D13390), mouse (SEQ ID 
NO:8, GenBank Acc. No. D29987) and rat (SEQ ID NO:9, GenBank Acc. 
No. Ml 8405). Asterisks indicate amino acid residues that arc conserved across all 
six species. This figure was created using the Pileup program of GCG (Program 
Manual for the Wisconsin Package, Version 9.0-OpenVMS, December 1996, 
10 Genetics Computer Group, 575 Science Drive, Madison, WI, USA 5371 1). 

Figure 4 is a diagram describing the construction of the intermediate 
plasmid vector pT7BlueR + PDOl. 

Figure 5 is a diagram describing the construction of E. coli expression 
vector pE24CPl. 

1 5 Applicants have provided a sequence listing in conformity with "Rules for 

the Standard Representation of Nucleotide and Amino Acid Sequences in Patent 
Applications" (Annexes I and II to the Decision of the President of the EPO, 
published in Supplement No. 2 to OJ EPO, 12/1992) and with 37 C.F.R. 
1.821-1.825 and Appendices A and B ("Requirements for Application Disclosures 

20 Containing Nucleotides and/or Amino Acid Sequences"). 

SEQ ID NO:l presents a partial nucleic acid sequence of an expressed 
sequence tag (EST) bearing GenBank Accession No. T92052 obtained from an 
Arabidopsis thaliana cDNA library. This sequence was contained in clone 
91B13T7 of the library. 

- 5 SE Q * D NO:2 presents the initial determination of the nucleic acid sequence 

and the deduced amino acid sequence of a cDNA encoding a full-length form of 
Arabidopsis /?-hydroxyphenyipyruvate dioxygenase enzyme, as 

contained in plasmid pGBPPD2. 

SEQ ID NO:3 presents the initially deduced amino acid sequence encoded 

30 by a cDNA for Arabidopsis thaliana /?-hydroxypheny]pyruvate dioxygenase 
enzyme. 

SEQ ID NOS:4 and 5 present the nucleotide sequences of a pair of 
complementary oligonucleotides (CAM 32 and CAM 33, respectively) used to 
facilitate subcloning and expression of the gene encoding p-hydroxyphenyl- 
35 pyruvate dioxygenase without the chloroplast transit sequence. 

SEQ ID NO:6 presents the amino acid sequence of /?-hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from human (GenBank Acc. No. U29895). 
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SEQ ID NO:7 presents the amino acid sequence of>-hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from pig (GenBank Acc. No. D 13390). 

SEQ ID NO:8 presents the amino acid sequence of/?-hydroxyphenyl- 
pyruvate dioxygenase enzyme derived from mouse (GenBank Acc. No. D29987). 
5 SEQ ID NO:9 presents the amino acid sequence of />hydroxyphenyl- 

pyruvate dioxygenase enzyme derived from rat (GenBank Acc. No. Ml 8405). 

SEQ ID NO: 10 presents the nucleic acid sequence and deduced amino acid 
sequence of the cloned cDNA encoding the Zea mays /?-hydroxyphenyIpyruvate 
dioxygenase enzyme, as contained in plasmid pMPDO. 
10 SEQ ID NO: 1 1 presents the deduced amino acid sequence of the cloned 

cDNA encoding the Zea wmtf/?-hydroxyphenyIpyruvate dioxygenase enzyme, as 
contained in plasmid pMPDO. 

■■- SEQ ID NO: 12 presents the nucleic acid sequence and the deduced amino 
acid sequence of the truncated form of Arabidopsis thahana /?-hydroxyphenyl- 
1 5 pyruvate dioxygenase enzyme as contained in pE24CP 1 . 

-■ ■ SEQ ID NO: 1 3 presents the deduced amino acid sequence of the truncated 
form of Arabidopsis f/ia//c*wtf/?-hydroxyphenyIpyruvate dioxygenase enzyme as 
contained in pE24CPl. 

SEQ ID NO: 14 presents the revised nucleic acid sequence and the deduced 
20 amino acid sequence of the cloned cDNA encoding the full-length Arabidopsis 
thai iana /?-hydroxypheny lpyruvate dioxygenase enzyme, as contained in plasmid 
pGBPPD2. 

SEQ ID NO: 15 presents the revised amino acid sequence deduced from the 
cDNA for the full length Arabidopsis thaliana /7-hydroxyphenylpyruvate 
25 dioxygenase enzyme. 

SEQ ID NO: 16 presents the nucleic acid sequence determined from a 
portion of acDNA from Vernonia galamenensis. as contained in clone 
vsl.pk0015.b2. 

DETAILS OF THE INVENTION 
30 BIOLOGICAL DEPOSITS 

The following biological materials have been deposited under the terms of 
the Budapest Treaty at American Type Culture Collection (ATCC), 12301 
Parklawn Drive, Rockville, MD 20852, and bear the following accession 
numbers: 
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PCT/US97/11295 



Depositor Identification 
Host Strain Plasmid 



Int'l. Depository* 
Accession Number 



Date of Deposit 
June 25, 1996 
June 25, 1996 
June 12. 1997 



E. coli BL21(DE3) pE24CPl 
N/A pGBPPD2 
N/A pMPDO 



ATCC 209120 



ATCC 97622 



ATCC 98083 



Definitions 

In the context of this disclosure, a number of terms shall be utilized. As 
used herein, the term "nucleic acid" refers to a large molecule which can be 
5 single-stranded or double-stranded, composed of monomers (nucleotides) 

containing a sugar, phosphate and either a purine or pyrimidine. A "nucleic acid 
fragment" is a portion of a given nucleic acid molecule. As used herein. "DNA V 
(deoxyribonucleic acid) is the genetic material, whereas "RNA" (ribonucleic acid) 
is involved in the transfer of the information encoded by the DNA into proteins 

10 and polypeptides. A "genome" is the entire body of genetic material contained in 
each cell of an organism. The term "nucleotide sequence" refers to a polymer of 
DNA or RNA which can be single- or double-stranded, optionally containing 
synthetic, non-natural or altered nucleotide bases capable of incorporation into 
DNA or RNA polymers. 

15 As used herein, "essentially similar" refers to DNA sequences that may 

involve base changes that do not cause a change in the encoded amino acid or 
which involve base changes which may alter one or more amino acids, but do not 
affect the functional properties of the protein encoded by the DNA sequence. It is 
therefore understood that the invention encompasses more than the specific 

20 exemplary sequences. Modifications to the sequence, such as deletions. 

insertions, or substitutions in the sequence which produce "silent changes" (i.e., 
those that do not substantially affect the functional properties of the resulting 
protein molecule) are also contemplated. For example, alteration(s) in the gene 
sequence which reflects the degeneracy of the genetic code, or which result in the 

25 production of a chemically equivalent amino acid at a given site, are 

contemplated; thus, a codon for the amino acid alanine, a hydrophobic amino acid, 
may be substituted by a codon encoding another less hydrophobic residue, such as 
glycine, or a more hydrophobic residue, such as valine, leucine, or isoleucine. 
Similarly, changes which result in substitution of one negatively charged residue 

30 for another, such as aspartic acid for glutamic acid, or one positively charged 

residue for another, such as lysine for argininc. can also be expected to produce a 
biologically equivalent product. Nucleotide changes which result in alteration of 
the N-terminal and C-tcrminal portions of the protein molecule would also not be 
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expected to alter the activity of the protein. In some cases, it may in fact be 
desirable to make mutants of the sequence in order to study the effect of alteration 
on the biological activity of the protein. Each of the proposed modifications is 
well within the routine skill in the art, as is determination of retention of 
5 biological activity of the encoded products. Moreover, the skilled artisan 

recognizes that "essentially similar" sequences encompassed by this invention are 
also defined by their ability to hybridize, under stringent conditions (0.1X SSC. 
0.1% SDS, 65°C), with the sequences exemplified herein. 

"Gene" refers to a nucleic acid fragment that encodes a specific protein. 
10 including regulatory sequences preceding (5' non-coding) and following (3' non- 
coding) the coding region. "Native" gene refers to the gene as found in nature 
with its own regulatory sequences. "Chimeric" gene refers to a gene comprising 
heterogeneous regulatory and coding sequences. "Endogenous" gene refers to the 
native gene normally found in its natural location in the genome. A "foreign" 
I 5 gene refers to a gene not normally found in the host organism but that is 
introduced by gene transfer. 

"Coding sequence" refers to a DNA sequence that codes for a specific 
protein and excludes the non-coding sequences. 

"Initiation codon" and "termination codon" refer to a unit of three adjacent 
20 nucleotides in a coding sequence that specifies initiation and termination, 

respectively, of protein synthesis (mRNA translation). "Open reading frame" 
refers to the amino acid sequence encoded between translation initiation and 
termination codons of a coding sequence. 

"RNA transcript" refers to the product resulting from RNA polymerase- * 
25 catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect 
complementary copy of the DNA sequence, it is referred to as the primary- 
transcript or it may be a RNA sequence derived from posttranscriptional 
processing of the primary transcript. "Messenger RNA" (mRNA) refers to RNA 
that can be translated into protein by the cell. "cDNA" refers to a double-stranded 
30 DNA, one strand of which is complementary to and derived from mRNA by 
reverse transcription. "Sense RNA" refers to RNA transcript that includes the 
mRNA. 

As used herein, "regulatory sequences" are nucleotide sequences that control 
the transcription or expression of a coding sequence located upstream (5*), within, 
35 or downstream (3 f ) to the coding sequence, act in conjunction with the protein 
biosynthetic apparatus of the cell and include promoters, translation leader 
sequences, transcription termination sequences, and polyadenylation sequences. 
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"Promoter" refers to a DNA sequence in a gene, usually upstream (5') to its 
coding sequence, which controls the expression of the coding sequence by 
providing the recognition for RNA polymerase and other factors required for 
proper transcription. A promoter may also contain DNA sequences that are 
5 involved in the binding of protein factors which control the effectiveness of 

transcription initiation in response to physiological or developmental conditions. 
In the case of eukaryotic organisms, it may also contain enhancer elements. 

An "enhancer element" is a DNA sequence which can stimulate promoter 
activity. It may be an innate element of the promoter or a heterologous element 

10 inserted to enhance the activity level and tissue-specificity of a promoter. 
"Constitutive promoters" refer to those enhancer elements that direct gene 
expression in all tissues and at all times. "Organ-specific" or "development- 
specific" promoters as referred to herein are those that direct gene expression 
almost exclusively in specific organs, such as leaves or seeds, or at specific 

1 5 development stages in an organ, such as in early or late embryogenesis, 
respectively. 

The term "operably linked" refers to nucleic acid sequences on a single 
nucleic acid molecule which are associated so that the function of one is affected 
by the other. For example, a promoter is operably linked with a structural gene 

20 (i.e., a gene encoding p-hydroxyphenylpyruvate dioxygenasc. as disclosed herein) 
when it is capable of affecting the expression of that structural gene (i.e., that the 
structural gene is under the transcriptional control of the promoter). 

The term "expression", as used herein, is intended to mean the production of 
the protein product encoded by a gene. More particularly, "expression" refers to 

25 the transcription and stable accumulation of the sense RNA (inRNA) derived from 
the nucleic acid fragment(s) of the invention that, in conjuction with the protein 
apparatus of the cell, results in altered levels of protein product. 
"Overexpression" refers to the production of a gene product in transgenic 
organisms that exceeds levels of production in normal or non-transformed 

30 organisms. "Altered levels" refers to the production of gene product(s) in 

transgenic organisms in amounts or proportions that differ from that of normal or 
non-transformed organisms. "Facilitating expression" refers to steps and 
conditions for culturing host cells containing the desirable gene to yield an 
increased production of the enzyme. For example, addition of a chemical inducer 

35 specific to the particular promoter operably linked to the gene facilitates 

expression of the encoded enzyme. This is measured relative to the production 
levels of an untreated gene. 
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The "3' non-coding sequences*' refers to the DNA sequence portion of a 
gene that contains a polyadenylation signal and any other regulatory signal 
capable of affecting mRNA processing or gene expression. The polyadenylation 
signal is usually characterized by affecting the addition of polyadenylic acid tracts 
5 to the 3' end of the mRNA precursor. 

The "translation leader sequence*" refers to that DNA sequence portion of a 
gene between the promoter and coding sequence that is transcribed into RNA and 
is present in the fully processed mRNA upstream (5*) of the translation start 
codon. The translation leader sequence may affect processing of the primary 
10 transcript to mRNA. mRNA stability, or translation efficiency. 

'Transformation'* herein refers to the transfer of a foreign gene into the 
genome of a host organism and its genetically stable inheritance. Bacterial 
transformation can proceed by any of several methods well known in the art, 
including calcium chloride-mediated transformation and electroporation. 
1 5 Examples of methods of plant transformation include Agrohacterhwi-mediated 
transformation and particle-accelerated or "gene gun" transformation technology 
(U.S. Patent No. 4.945,050). 

"Host ceir refers to the cell that is transformed with the introduced genetic 
material. 

20 "Plasmid vector" refers to a double-stranded, closed circular, extra- 

chromosomal DNA molecule. 

'Tolerant" or "tolerance" refers to a condition whereby a cell or an organism 
is able to withstand the effect of application of a compound or composition at a 
concentration or application rate that causes a demonstrable effect in or against 

25 cells or organisms that are not tolerant. For example, the growth or survival of a 
plant that is tolerant to application of a hcrbicidal compound or composition will 
be less affected than the growth or survival of a plant that is not tolerant to 
application of the herbicidal compound or composition. 
Cloning of Plant Genes Encoding p-Hvdroxvphenvlpvruvate Dioxveenase 

30 The /7-hydroxyphenylpyruvate dioxygenases from plants are a promising 

new class of targets for new herbicidal compounds. In order to be able to study 
this enzyme in detail, and to have available supplies of enzyme for inhibitor 
screening, cDNA clones encoding plant p-hydroxyphenylpyruvate dioxygenases 
were identified. These nucleic acid fragments are useful for the production of 

35 their encoded enzymes, for isolation of clones from additional plant sources that 
encode other /?-hydroxyphenylpyruvate dioxygenase enzymes, and for 
understanding the biochemical and structural properties of these enzymes. 
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Nucleic acid fragments comprising nucleotide sequences that encode 
different forms of the enzyme p-hydroxypheny [pyruvate dioxygenase from the 
plant Arabidopsis (haliana have now been isolated. Subsequently, these 
nucleotide sequences were expressed in E. coli cells and shown to direct the 
5 synthesis of plant /?-hydroxyphenylpyruvate dioxygenase enzymes. 

An automated search of nucleotide sequences contained in a database 
representing an Arabidopsis cDNA library for sequences homologous to other 
known, non-plant />hydroxyphenylpyruvate dioxygenase genes revealed the 
plasmid cDNA clone 91B13T7. This cDNA was obtained from the Arabidopsis 

1 0 Seed Stock Center at Ohio State University. Plasmid DNA suitable for nucleotide 
sequence determination was prepared and the nucleotide sequence of the plasmid 
insert was determined. The resulting sequence was not interpretable. suggesting 
possible contamination of the plasmid sample by an extraneous nucleic acid. This 
assumption was confirmed by digesting the plasmid DNA sample with restriction 

15 enzymes and separating the resulting nucleic acid fragments by agarose gel 

electrophoresis. This analysis revealed the presence of nucleic acid fragments that 
could not be derived from the plasmid carrying the putative /7-hydroxyphenyl- 
pyruvate dioxygenase fragment. Furthermore, a search of the publically available 
nucleic acid sequence databases revealed that the Arabidopsis /haliana sequence 

20 reported for cDNA clone 91 B13T7 corresponded to a truncated cDNA (Figure 1 ). 
Based on publically available mammalian cDNA sequence information for 
/>hydroxyphenylpyruvate dioxygenase, the minimum length expected for a cDNA 
encoding a complete p-hydroxyphenylpyruvate dioxygenase enzyme is 1 kb 
(Table I). 

25 

Table 1 

Predicted cDNA Length for Sequences 
Encoding /?-Hydroxyphenylpyruvate Dioxygenase 



30 





Amino Acid 




Organism 


Residues 


Minimum cDNA (kb) 


Human 


392 


1.176 


Pig 


392 


1.176 


Pseudomonas sp. 


357 


1.071 



Therefore, based on the expected length of a cDNA capable of encoding a 
functional /7-hydroxyphenylpyruvate dioxygenase, the Arabidopsis (haliana 
sequence obtained from the public database was insufficient to encode a full- 
length, active p-hydroxyphenylpyruvate dioxygenase enzyme. Therefore, a cDNA 
35 with the capacity to encode a full-length enzyme Arabidopsis (haliana was cloned. 
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as described herein. A 400 bp segment of the insert of plasmid 9 1 B 1 3T7 was 
liberated by digestion with restriction enzymes and used to screen a cDNA library 
prepared from norflurazon-treated Arabidopsis thaliana seedlings (Scolnik. P. A., 
and Bartiey. G. E, (1994) Plant Physiol 104:1469-1470). Several clones showing 
5 positive hybridization to this probe were sequenced. The initial determination of 
the sequence of the longest cDNA clone obtained from this effort is shown in 
Figure 2 and in SEQ ID NO:2. During the course of subsequent work with this 
clone.it became necessary to confirm certain features of the sequence. A corrected 
sequence of this cDNA is presented in SEQ ID NO: 12. 
10 The sequence reported in Figure 2 indicates that this cDNA has the capacity 

to encode a protein of MW 48,841 which, as shown in Figure 3. has a high level 
of homology to/?-hydroxyphenylpyruvate dioxygenase enzymes from other 
eukaryotes. 

. A cDNA capable of encoding a full-length /?-hydroxyphenylpyruvate 
1 5 dioxygenase has also been obtained from corn. This cDNA. contained in plasmid 
pMPDO, was identified in a corn cDNA library using an approximately 900 base 
pairs portion of the Arabidopsis cDNA as a probe. The predicted amino acid 
sequence that is encoded by the corn cDNA is also compared to />hydroxypheny- 
lpyruvate dioxygenase enzymes from other eukaryotes in Figure 3. 

20 A cDNA library was prepared from messenger RNA isolated from 

developing seeds of Vernonia galamenensis. Random sequencing of the clones \ 
contained in the library identified a probable clone, designated vsl .pkOOl 5.b2. for 
the/7-hydroxyphenylpyruvate dioxygenase from this plant. The 513 bp expressed 
sequence tag (EST) is presented in SEQ ID NO: 16. 

25 Expression of the Arabidopsis thaliana cDNA Encoding p-Hvdroxvphenvl- 
pyruvate Dioxygenase in E. coli 

The nucleic acid fragments of the instant invention encoding a plant 
p-hydroxyphenylpyruvate dioxygenase enzymes can be operably linked to suitable 
regulatory sequences, thereby creating chimeric genes that can be used to direct 

30 expression of the enzyme in transgenic organisms. These transgenic organisms 
include, but are not limited to: plants (Plant Molecular Biology; Croy, R. R. D.. 
Ed.; Bios Scientific Publishers; 1993); microorganisms, including Escherichia 
coli (Gold, L. ( 1 990) Methods in Enzymology 185:11), Bacillus subtilis (Henner, 
D. J. (1990) Methods in Enzymology 185:199), yeast (Gellissen, G., et al. (1992) 

35 Antonie Leeuwenhoek 62:79), and fungi, including members of the genus 

Aspergillus (Devchand, M. and Gwynne, D. I. (1991) J. BiotechnoL 17:3); and 
insect cells containing recombinant baculoviruses (Lukow, V. A. and Summers. 
M. D. (1988) Bio/Technolog\> 6:47). 
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One skilled in the art can isolate the coding sequences from the fragments of 
the invention by using or creating sites for restriction endonucleases. as described 
in Sambrook, J., et al.((1989) Molecular Cloning, A Laboratory Manual. 2nd ed.; 
Cold Spring Harbor Laboratory Press; hereinafter "Maniatis"). Alternatively, 
5 polymerase chain reaction (PCR) techniques can be employed to isolate and/or 
modify the fragments of the invention (Newton, C. R. and Graham. A. (1994) 
PCR; Bios Scientific Publishers). 

Arabidopsis p-hydvoxyphenylpyvuvate dioxygenase was expressed in E. coli 
under control of a T7 promoter in a strain expressing T7 RNA polymerase 

1 0 (Studier, F. W., et al. (1990) Methods in Enzymotogy 1 85:60). Promoters other 
than T7 are commonly used in expression vectors and could be substituted for 
protein expression in E. coli. Examples of alternative promoters include, but are 
not limited to. trp (Yansura, D. G. and Henner. D. J. (1990) Methods in 
Enzymology 185:54), P L (Remaut, E. et al. (1981) Gene 15.81), lac (Amann. E. et 

15 al. (1983) Gene 25:167), trc (Amann. E. ct al. (1988) Gene 69:301 ). and 

promoters such as lacUVS, Ipp, P R , and hybrid and tandem promoters constructed 
to combine specific features to increase strength or regulation capacity (Balbas. P. 
and Bolivar, F. (1 990) Methods in Enzymology 1 85: 1 4). 
Biochemical Evidence of Enzymatic Fiinction 

20 The enzyme p-hydroxyphenylpyruvate dioxygenase catalyzes the reaction of 

p-hydroxyphenylpyruvate with molecular oxygen to give homogentisate and CCh. 
The enzyme can be assayed by measuring oxygen utilization (Hager, S. E,, et al. 

(1957) J. Bioi Chem. 225:935-947), C0 2 release or homogentisate production 
from radioactive labeled p-hydroxyphenylpyruvate (Lindblad, B. (1971) Clin. 

25 Chem. Acta 34:1 13-121), loss of the /?-hydroxypheny [pyruvate (Lin. E. C. C. et al. 

(1958) ,/. Bioi Chem. 233:668-673), or formation of homogentisate using a 
colorimetric assay (Fellman, J. H. et al. (1972) Biochim. Biophys. Acta 
284:90-100) or UV detection following HPLC or a similar chromatographic 
separation technique. The activity of p-hydroxyphenylpyruvate dioxygenase may 

30 also be measured in a coupled assay in which the initial product, homogentisate. is 
oxidized by homogentisate dioxygenase; formation of maleylacetoacetate 
determined by measuring absorbance at 330 nm (Fernandez-Canon. J. M. and 
Penal va, M. A. (1997) AnaL Biochem. 245:218-221). 

An alternative to any of the kinetic assays for p-hydroxyphenylpyruvate 

35 dioxygenase is an end-point or Fixed-time assay. The procedure is based on the 
conversion of unconverted substrate, /?-hvdroxyphenylpyruvate to its enediol 
tautomer by tautomerase in the presence of borate ions and measurement of the 
characteristic 308 nm peak of the tautomer (Lin. E. C. C. et al. (1958) J. Bioi 
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Chem. 233:668-673). The procedure involves the addition of enough 
p-hydroxyphenylpyruvate dioxygenase to consume -80% of the organic substrate 
over 1 hour in 200 \iL of assay buffer, which in this case is a 50 mM Tris, pH 7.4, 
0.10 mM p-hydroxyphenylpyruvic acid, 1.75 mM ascorbate and 1.25 mM EDTA. 
5 After 1 hr the reaction is quenched by the addition of 100 of 0.8 M borate, 

pH 7.3, containing 1000 ppb of a /?-hydroxyphenyipyruvate dioxygenase inhibitor 
and 0.25 fiL of 6. 1 mg/mL of tautomerase. The absorbance at 308 nm is read after 
a 30 min incubation and is stable thereafter for 2 hr. The advantage of this assay 
over the kinetic procedure is that the p-hydroxyphenylpyruvate dioxygenase is not 
1 0 required to oxidize the substrate in the presence of high concentrations of borate, a 
condition that might interfere with the mode of action of inhibitors. Furthermore 
the assay produces essentially a stable binary indication ofp-hydroxypheny- 
Ipyruvate dioxygenase inhibition, and is well-suited for applications which require 
a high-throughput of samples and assays. 
1 5 The enzyme encoded by the nucleic acid fragments and overexpressed in 

£. coli can be extracted in any conventional buffer used for extracting soluble 
plant enzymes. Although a large amount of an overexpressed protein is often 
insoluble, the amount that is soluble represents can represent as much as 50% of 
the total soluble protein. Soluble overexpressed protein has high p-hydroxy- 
20 phenylpyruvate dioxygenase activity and is easily extracted. Likewise, it may be 
possible to resolubilize an insoluble overexpressed protein in an active form under 
appropriate conditions, since addition of sarkosyl (sodium N-lauroylsarcosinate) 
to the extraction buffer appeared to increase the amount of the overexpressed 
protein extracted. For optimum activity, a reducing agent such as ascorbate or 
25 reduced glutathione should be present as well as a source a ferrous ion. 

An overexpressed enzyme can be assayed using all the techniques 
described above for measuring /?-hydroxyphenylpyruvate dioxygenase activity, 
while only the techniques using labeled p-hydroxyphenylpyruvate can be used to 
measure activity in crude plant extracts. Therefore, the availability of an 
30 overexpressed enzyme greatly facilitates the development of high capacity screens 
to identify inhibitors of the enzyme. Potential inhibitors are evaluated for their 
capacity to reduce the rate of the reaction of the enzyme, resulting in reduced 
oxygen uptake and C0 2 release, and lower rates of formation of homogentisate 
and loss of p-hydroxyphenylpyruvate. Applicants have demonstrated that at least 
35 one of the instant nucleic acid fragments can be overexpressed in E. coli cells, 
resulting in production of a protein that catalyzes the conversion of/?-hydroxy- 
phenylpyruvate to homogentisate with the release of C0 2 - Furthermore, it has 
been shown that this activity is inhibited by commercial herbicides known to 
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inhibit /?-hydroxyphenyipyruvate dioxygcnase. Finally, an overexpressed enzyme 
can be used in a high capacity assay to identify compounds that inhibit the 
enzymatic activity of />hydroxyphenylpyruvate dioxygenase. Such compounds 
may serve as herbicides. 
5 Preparation of Plants Tolerant to Inhibitors of p-Hvdroxvphenvlpvruvate 
Dioxveenase 

This invention embodies plants which are resistant or at least tolerant to 
herbicides that target the p-hydroxyphenylpyruvate dioxygenase enzyme at levels 
which are normally inhibitory to the naturally occurring /7-hydroxyphenylpyruvatc 

10 dioxygenase enzyme. This altered /7-hydroxyphenylpyruvate dioxygenase activity 
is conferred by ( 1) overexpression of the wild-type />hydroxyphenylpyruvate 
dioxygenase enzyme, or (2) expression of a DNA molecule encoding a herbicide- 
tolerant enzyme. The said enzyme may be a modified form of an p-hydroxy- 
phenylpyruvate dioxygenase enzyme that occurs naturally in a eukaryote or 

15 prokaryote, or a modified form of an /7-hydroxyphenylpyruvate dioxygenase 
enzyme that naturally occurs in a plant, or a herbicide tolerant enzyme that 
naturally occurs in a prokaryote (Duke et al. Herbicide Resistant Crops: Lewis: 
Boca Raton; 1994). An effective amount of gene expression to render the cells of 
the plant tissue substantially tolerant to the herbicide depends on whether the gene 

20 codes for an unaltered /7-hydroxyphenylpyruvate dioxygenase gene or a mutant or 
altered form of the gene that is less sensitive to the herbicides. Expression of an 
unaltered plant p-hydroxyphenylpyruvate dioxygenase gene in an effective 
amount is that amount that provides for a 2- to 10-fold increase in herbicide 
tolerance. Plants encompassed by the invention include monocotyledoneous and 

25 dicotyledoneous plants. Preferred are those plants which would be potential 
targets for /7-hydroxyphenylpyruvate dioxygenase-inhibiting herbicides, 
particularly agronomical ly important crops such as maize and other cereal crops. 

Increased levels of expression of p-hydroxyphenylpyruvate dioxygenase 
activity, from two to ten or more times the natively expressed amount, would be 

30 sufficient to overcome growth inhibition caused by the herbicide. Plants 

containing such altered /7-hydroxyphenylpyruvate dioxygenase enzyme activity 
can be obtained by direct selection in plants. This method is known in the art. 
See, e.g., U.S. Patent No. 5,162,602. U.S. Patent No. 4,761.373, and references 
cited therein. 

35 Overexpression of /7-hydroxyphenylpyruvate dioxygenase also can be 

accomplished by stably transforming a host plant cell with a chimeric DNA 
molecule comprising a promoter capable of driving expression of an associated 
coding sequence in a plant cell and operably linked to a homologous or 
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heterologous coding sequence encoding p-hydroxyphenylpyruvate dioxygenasc. 
A "homologous" />hydroxyphenyipyruvate dioxygenase gene is isolated from an 
organism taxonomically identical to the target plant cell, whereas a "heterologous" 
/7-hydroxyphenylpyruvate dioxygenase gene is obtained from an organism 
5 taxonomically distinct from the target plant. 

The expression of foreign genes in plants is well-established (De Blaere et 
al., (1987) Meth. Enzymoi 143:277-291). Promoters utilized to drive gene 
expression in transgenic plants or plant cells (i .e.. those capable of driving 
expression of the associated coding sequences such as /7-hydroxyphenylpyruvate 
10 dioxygenase in plant cells, include those directing the 19S and 35S transcripts in 
Cauliflower mosaic virus (Odell et al., (1 985) Nature 3 1 3:8 1 0-8 12; Hull et al.. 
(1987) Virology 86:482-493), small subunil of ribulose 1.5-bisphosphate 
carboxylase (Morelli et al., (1985) Nature 315:200-204: Broglie et al.. (1984) 
Science 224:838-843: Hererra-Estrella et al.. ( 1 984) Nature 310:1 15-120; Coruzzi 
15 et al.,.(1984) EM BO J. 3:1671-1679; Faciotti et al.. (1985) Bio/Technology 3:241 
and chlorophyll a/b binding protein (Lamppa et al., (1986) Nature 316:750-752): 
nopaline synthase promoters (Depicker et al. (1982) J. MoL App. Genet. 
7:561-573; An et al. ( 1990) Plant Cell 2:225-233). The chimeric DNA 
construct(s) of the invention may contain multiple copies of a promoter or 
20 multiple copies of the />hydroxypheny I pyruvate dioxygenasc coding sequences. 
In addition, the construct(s) may include coding sequences for selectable markers 
and coding sequences for other peptides such as signal or transit peptides. The 
preparation of such constructs is within the ordinary level of skill in the art. 
Resistance to inhibitors of the plant carotenoid biosynthesis pathway, which is 
25 also targeted by /J-hydroxyphenylpyruvate dioxygenase inhibitors, has been 

achieved by expressing a bacterial gene encoding phytoenc desaturase driven by 
the CaMV promoter (Misawa et al., ( 1 994) Plant. J. ^:48 1 -490). 

Transit peptides may be fused to the p-hydroxyphenyl pyruvate dioxygenase 
coding sequence in the chimeric DNA constructs of the invention to direct 
30 transport of the expressed p-hydroxyphenylpyruvate dioxygenase enzyme to the 
desired site of action. Examples of transit peptides include the chloroplast transit 
peptides such as those described in Von Heijne et al., ( 1991 ) Plant MoL Biol. Rep. 
9: 104-126; Mazur et al., (1987) Plant Physiol. 85:1 1 10; Vorst et al., (1988) Gene 
65:59; and mitochondrial transit peptides such as those described in Boutry et aL 
35 (1987) Nature 328:340-342. 

It is envisioned that the introduction of enhancers or enhancer-like elements 
into other promoter constructs will also provide increased levels of primary 
transcription to accomplish the invention. These would include viral enhancers 
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such as that found in the 35S promoter (Odeli et al., ( 1 988) Plant MoL Biol. 
10:263-272), enhancers from the opine genes (Fromm et al., (1989) Plant Cell 
1 :977-984), or enhancers from any other source that result in increased 
transcription when placed into a promoter operably linked to the nucleic acid 
5 fragment of the invention. 

Introns isolated from the maize Adh-1 and Bz-1 genes (Callis et al., (1987) 
Genes Dev. 1:1 183-1200), and intron I and exon 1 of the maize Shrunken- 1 (sh-1) 
gene (Maas et al. ? (1991) Plant MoL Biol. 16:199-207) may also be of use to 
increase expression of introduced genes. Results with the first intron of the maize 
1 0 alcohol dehydrogenase (Adh- 1 ) gene indicate that when this DN A element is 

placed within the transcriptional unit of a heterologous gene, mRNA levels can be 
increased by 6.7-fold over normal levels. Similar levels of intron enhancement 
have been observed using intron 3 of a maize actin gene (Luehrsen. K. R. and 
Walbot, V.. (1991) MoL Gen. Genet. 225:81-93). Enhancement of gene 
15 expression by Adhl intron 6 (Oard et al.. (1989) Plant Celt Rep 8:156-160) has 
also been noted. Exon 1 and intron 1 of the maize sh-l gene have been shown to 
individually increase expression of reporter genes in maize suspension cultures by 
10 and 100-fold, respectively. When used in combination, these elements have 
been shown to produce up to 1000-fold stimulation of reporter gene expression 
20 (MaasetaL, (\99\) Plant MoL BioL 16:199-207). 

Any 3' non-coding region capable of providing a polyadenylation signal and 
other regulatory sequences that may be required for proper expression can be used 
to accomplish the invention. This would include the 3' end from any storage 
protein such as the 3' end of the lOkd, 1 5kd, 27kd and alpha zein genes, the 3' end 
25 of the bean phaseolin gene, the 3' end of the soybean P-conglycimn gene, the 3' 
end from viral genes such as the 3' end ot the 35S or the 19S cauliflower mosaic 
virus transcripts, the 3' end from the opine synthesis genes, the 3' ends of ribulose 
1,5-bisphosphate carboxylase or chlorophyll a/b binding protein, or 3' end 
sequences from any source such that the sequence employed provides the 
30 necessary regulatory information within its nucleic acid sequence to result in the 
proper expression of the promoter/coding region combination to which it is 
operably linked. There are numerous examples in the art that teach the usefulness 
of different 3' non-coding regions (for example, sec Ingelbrecht et al., (1989) 
Plant Cell 1:671-680). 
35 Various methods of introducing a DNA sequence ( i.e.. of transforming) into 

eukaryotic cells of higher plants are available to those skilled in the art (see EPO 
publications 0 295 959 A2 and 0 138 341 A 1). Such methods include high- 
velocity ballistic bombardment with metal particles coated with the nucleic acid 
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constructs (see Klein et aL, (1987) Nature (London) 327:70-73. and see U.S. 
Patent No. 4,945,050), as well as those based on transformation vectors based on 
the Ti and Ri plasmids of Agrobacterium spp.. particularly the binary type of these 
vectors. Ti-derived vectors transform a wide variety of higher plants, including 
5 monocotyledonous and dicotyledonous plants, such as soybean, cotton and rape 
seed (Pacciotti et aL (1985) Bio/Technology 3:241 ; Byrne et al.. (1987) Plant 
Cell Tissue and Organ Culture 8:3; Sukhapinda et al., (1987) Plant Mol. Biol. 
8:209-216; Lorz et aL (1985) Moi Gen. Genet. 199:178-182: Potrykus et al., 
(1985) Mol Gen Genet. 199:183-188). 
10 Other transformation methods are available to those skilled in the an, such 

as direct uptake of foreign DNA constructs (see EPO publication 0 295 959 A2). 
and techniques of electroporation (see Fromm et aL, (1986) Nature (London) 
319:791-793). Once transformed, the cells can be regenerated by those skilled in 
the art. Also relevant are several recently described methods of introducing 
1 5 nucleic acid fragments into commercially important crops, such as rapeseed (see 
De Block et al., (1989) Plant Physiol 91 :694-701), sunflower ( Everett et al.. 
(1987) Bio/Technology 5:1201-1204), soybean (McCabe et aL. (1988) 
Bio/Technology 6:923-926; Hinchee et aL. (1988) Bio/Technology 6:915-922; 
Chee et aL, (1989) Plant Physiol. 91:1212-1218: Christou et al.. (1989) Proc. 
20 Natl Acad Sci USA 86:7500-7504; EPO Publication 0 301 749 A2), and corn 
(Gordon-Kamm et aL, (1990) Plant Cell 2:603-618: and Fromm et aL, (1990) 
Bio/Technology 8:833-839). 

Altered /7-hydroxyphenylpyruvate dioxygenase enzyme activity may also be 
achieved through the generation or identification of modified forms of the isolated 
25 eukaryotic p-hydroxyphenylpyruvate dioxygenase coding sequence having at least 
one amino acid substitution, addition or deletion which encodes an altered 
p-hydroxyphenylpyruvate dioxygenase enzyme resistant to a herbicide that 
inhibits the unaltered, naturally occurring form. Genes encoding such enzymes 
can be obtained by numerous strategies known in the an. A first general strategy 
30 involves direct or indirect mutagenesis procedures on microbes (e.g., E. coli, 
S. cerevisiae (Miller, (1972) Experiments in Molecular Genetics, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, NY; Davis et aL, (1980) Advanced 
Bacterial Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY; 
Sherman et aL, (1983) Methods in Yeast Genetics. Cold Spring Harbor 
35 Laboratory, Gold Spring Harbor NY; and U.S. Patent No. 4,975 ,3 74) and 
cyanobacteria (Bryant, The Molecular Biology* of Cyanobacteria; Kluwer 
Academic Publishers: Boston, 1995). A second method of obtaining mutant 
herbicide-resistant alleles of the eukaryotic /?-hydroxyphenylpyruvate dioxygenase 
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enzyme involves direct selection in plants. For example, the effect of inhibitors 
on the growth of plants such as Arabidopsis, soybean, or maize may be 
determined by plating seeds sterilized by art-recognized methods on plates on a 
simple minimal salts medium containing increasing concentrations of the 
5 inhibitor. The lowest dose at which significant growth inhibition can be 

reproducibly detected is used for subsequent experiments. Mutagenesis of plant 
material may be utilized to increase the frequency at which resistant alleles occur 
in the selected population. Mutagenized seed material can be derived from a 
variety of sources, including chemical or physical mutagenesis or seeds, or 

1 0 chemical or physical mutagenesis or pollen (Ncuffer, In Maize for Biological 

Research. Sheridan, ed. Univ. Press, Grand Forks, ND. ? pp. 61-64 (1982)), which 
is then used to fertilize plants and the resulting M 1 mutant seeds collected. 
Typically, for Arabidopsis. M2 seeds (i.e., progeny seeds of plants grown from 
seeds mutagenized with chemicals, such as ethyl methane sulfonate, or with 

1 5 physical agents, such as gamma rays or fast neutrons ) are plated at densities of up 
to 10,000 seeds/plate (10 cm diameter) on minimal salts medium containing an 
appropriate concentration of inhibitor. Seedlings that continue to grow and 
remain green 7-21 days after plating are transplanted to soil and grown to maturity 
and seed set. Progeny of these seeds are tested for resistance to the herbicide. If 

20 the resistance trait is dominant, plants whose seed segregate 3:1 

(resistant:sensitive) are presumed to have been heterozygous for the resistance at 
the M2 generation. Plants that give rise to all resistant seed are presumed to have 
been homozygous for the resistance at the M2 generation. Such mutagenesis on 
intact seeds and screening of their M2 progeny seed can also be carried out on 

25 other species, for instance soybean (see. e.g.. U.S. Patent No. 5.084.082). Mutant 
seeds to be screened for herbicide tolerance can also be obtained as a result of 
fertilization with pollen mutagenized by chemical or physical means. 

EXAMPLE 1 
Cloning of a cDNA for Arabidopsis thaliana 

30 p-Hvdroxvphenvlnvruvatc Dioxvuenase 

The plasmid containing the Arabidopsis thaliana 91B13T7 expressed 
sequence tag (Newman et al.. (1994) Plant Physiol 106:1241-1255) was digested 
with the restriction enzymes BamUl and EcoRL and the resulting 400 bp fragment 
was used to screen a lambda phage cDNA library of Arabidopsis thaliana 

35 seedlings (Scolnik, P. A. and Bartley. G. E. (1994) Plant Physiol. 104:1469-1470) 
according to the following protocol. 

E. coli KW25 1 cells were grown overnight in Luria Broth ("LB") containing 
0.2% maltose and 10 mM MgS0 4 . Cells were pelleted by centrifugation and 
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resuspended in 10 mM MgS0 4 to an OD 600 of 0.5. Cell aliquots (0.8 mL) were 
mixed with 0.] mL of diluted phage samples and 7 mL of top agarose (0.7% 
agarose in LB containing 10 mM MgS0 4 ) at 45°C ? and plated onto 150 mm Petri 
dishes containing LB agar. Phage plaques became visible in 5-7 h, at which point 
5 the plates were placed at 4°C. 

Phage plaques were transferred to nitrocellulose filters according to standard 
techniques, and the filters were hybrized to 32 P-radiolabeled probe prepared 
according to the method of Feinberg and Vogelstein ((1983) Anal. Biochem. 
132:6-13). using the hybridization conditions of Berlyn et aL((1989) Proc. Nail. 
10 Acad Sci. 86:4604-4608). After exposure to X-ray film for 48 h, 12 positive 

plaques were eluted, plated, and hybridized under the same conditions. A total of 
9 plaques that retained positive signals in this second round of hybridization were 
subjected to in vivo excision using the Exassist/SOLR™ system according to the 
manufacturer's protocol (Stratagene Cloning Systems. La Jolla. CA). DNA from 
1 5 the plasmids resulting from in vivo excision of positive plaques was prepared for 
DNA-sequencing using the Wizard Plus™ kit (Promega. Madison, WI). Eight of 
the clones that were sequenced showed strong conservation with available 
/7-hydroxyphenyIpyruvate dioxygenase sequences, whereas the remaining clone 
did not correspond to a/?-hydroxyphenylpyruvate dioxygenase. Alignment with 
20 known /^-hydroxy phenyl pyruvate dioxygenase sequences also revealed that two of 
the clones correspond to 0.3 kbp fragments from the 3' end of the transcript, and 
another two to 1 .2 kbp fragments from the 5' end of the transcript. One clone of 
each was used to assemble a 1.5 kbp cDNA by ligating at the internal Nhel 
restriction site (Figure 1). The initial determination of the DNA sequence (SEQ * 
25 ID NO:2) of the resulting cDNA clone is shown in Figure 2. Subsequent work 
with this DNA fragment required confirmation of some of the features of its 
sequence. Approximately ten nucleotide residues were found to have been listed 
in error. Thus a corrected sequence for this DNA fragment is listed in SEQ ID 
NO: 1 4 and the deduced amino acid sequence is set forth in SEQ ID NO: 1 5. The 
30 revised sequences form the bases for analyses and comparisons reported herein. 

EXAMPLE 2 
Overexoression of the Arabidopsis cDNA in E. coli 
The deduced amino acid sequence for Arabidopsis /7-hydroxyphenyI- 
pyruvate dioxygenase was aligned with the amino acid sequences of 
35 /?-hydroxyphenylpyruvate dioxygenase from mouse, pig, and Streptomyces 

avermitilis using the Pileup program of GCG (Program Manual for the Wisconsin 
Package, Version 8, September 1994. Genetics Computer Group, 575 Science 
Drive, Madison, WI, USA 5371 1). This analysis suggested an additional 
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29 amino acid-extension at the amino terminus of the Arabidopsis sequence 
(positions 1-29, Figure 3 and SEQ ID NO:3). This amino-terminal extension was 
assumed to be a chloroplast transit peptide which would be absent from the 
mature enzyme. Therefore, removal of the chloroplast transit peptide coding 
5 sequence coincided with transfer of the /7-hydroxyphenylpyruvate dioxygenase 
coding sequence from the cloning vector into the expression vector. 

The Arabidopsis p-hydroxyphenylpyruvate dioxygenase cDN A was moved 
from the pBluescript SK- cloning vector (Strataeene, La Jolla, CA) to the 
pET24c(+) expression vector (Novagen, Madison. Wl) through the intermediate 

10 cloning vector pT7BlueR (Novagen). The plasmid pGBPPD2 consists of the 
Arabidopsis p-hydroxyphenylpyruvate dioxygenase cDN A and the pBluescript 
SK- cloning vector (Stratagene). The plasmid pE24CPl consists of the 
Arabidopsis p-hydroxypheny\pyr\\va.ie dioxygenase cDNA, without the putative 
chloroplast transit peptide DNA sequence, and the pET24c( + ) expression vector 

1 5 (Novagen). 

The plasmids pGBPPD2 and pT7BlueR (5 ^g each) were individually 
digested with 20 units of Xba I (New England Biolabs, NEB, Beverly, MA) and 
20 units of Hind III (Gibco BRL. Gaithersburg, MD) in NEB restriction enzyme 
buffer 2 supplemented with 100 |ag/mL bovine serum albumin at 37 °C for 1.75 h. 

20 Digesting pGBPPD2 with the restriction enzymes Xba I and Hind III releases the 
5' and 3' ends, respectively, of the /7-hydroxyphenylpyruvate dioxygenase cDNA 
from the pBluescript SK- polylinker. Products of the digestion were electro- 
phoretically separated in a 1 percent agarose gel using TRIS/acetate/EDTA (TAE) 
buffer and visualized with ethidium bromide staining (Maniatis). Digestion of 

25 pGBPPD2 with the two restriction endonucleases resulted in a 2922 bp vector 
band and 1499 bp /7-hydroxyphenylpyruvate dioxygenase cDNA band. Only a 
2863 bp band was apparent after digesting pT7BlueR with the two enzymes, 
although a 24 bp fragment would also result. The 1499 bp />hydroxypheny- 
lpyruvate dioxygenase band and the 2863 bp T7BlueR band were cut out of the 

30 gel and the associated DNA purified from the agarose using a QIAquick Gel 
Extraction Kit (Qiagen, Chatsworth. CA) according to the manufacturer's 
instructions. The purified DNA samples were precipitated by the addition of 
sodium acetate (pH 5.2) to 0.3 M. 10 jag tRNA (added as carrier), two volumes of 
-20 °C ethanol and incubation at -20 °C overnight. Nucleic acid pellets were 

35 collected by centrifugation. washed with 70% ethanol and air dried. Both pellets 
were solublized in 10 \xL of TRIS/EDTA (TE) buffer, pH 8 (Maniatis), and then 
1 nL of each sample loaded onto a 1% agarose, TAE gel in separate wells next to 
a well containing 4 |iL of Mass Ladder (Gibco BRL). All samples were adjusted 
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to 10 |iL with water before loading. DNA was quantified by comparing band 
intensities of each sample with Mass Ladder band intensities following ethidium 
bromide staining and UV illumination. 

Approximately 300 ng of/?-hydroxyphenylpyruvate dioxygenase insert was 
5 mixed with 300 ng of double digested pT7BlueR vector in a total volume of 7 
and then heated to 45 °C for 5 min followed by cooling on ice. T4 DNA ligase 
buffer (Gibco BRL) and 1 unit of T4 DNA ligase (Gibco BRL) were added to the 
cooled DNA for a total volume of 10 ^iL. The ligation mix was incubated at room 
temperature for 4 h and then transformed into MAX Efficiency DH5a Competent 
10 Cells (Gibco BRL) of E. coli according to standard procedures (Maniatis). 
Transformed bacteria were spread onto LB agar plates supplemented with 
100 |ag/mL carbenicillin and incubated overnight at 37 °C. Seventeen bacterial 
colonies were selected for subsequent analysis. A portion of each colony was 
inoculated into a separate 17x100 mm polypropylene culture tube (Falcon, 
1 5 Lincoln Park, NJ) containing 2 mL of liquid LB media and 200 \xgimL 

carbenicillin. Liquid bacteria cultures were incubated overnight at 37 °C with 
shaking (250 rpm). Plasmid DNA was then isolated using a QIAprep Spin 
Plasmid Miniprep Kit (Qiagen) according to the manufacturer's instructions. A 
portion (5 \xL out of 50 ^iL total) of each plasmid preparation was digested with 
20 10 units each of Hind III and EcoR V (Gibco BRL) in a total volume of 15 \xL 
with React 2 buffer (Gibco BRL) for one h. (Note: The EcoRV site in the 
pBluescript polylinker was destroyed during the preparation of pGBPPD2 so only 
the EcoRV site in the pT7BlueR polylinker would be accessible to the restriction 
nuclease). Samples were separated electrophoretically in 1% agarose and 
25 tris/borate/EDTA (TBE) buffer (Maniatis). Bands were visualized with ethidium 
bromide staining; 7 out of 17 samples which contained 2 bands (2837 and 
1 525 bp) contained the />hydroxyphenylpyruvate dioxygenase insert and were 
designated pT7BlueR+PD01 (see Figure 4). 

In order to remove the putative chloroplast transit sequence, the remaining 
30 45 \xL of each prep of pT7BlueR+PDO! were combined into a single sample and 
the DNA content determined spectrophotometrically at A26O (Maniatis). A 
portion (5 \xg) of pT7BlueR+PD01 was digested with 16 units of Eco47 III (MBI 
Fermentas) in a total volume of 1 00 fiL containing buffer 0 (MBI Fermentas) at 
37 °C for 2 h. The digested plasmid DNA was then precipitated with sodium 
35 acetate and ethanol as above and the resulting dried nucleic acid pellet was 

dissolved in 60 \xL of React 2 (Gibco BRL) containing 20 units of Nde I (Gibco 
BRL) and incubated 2 h at 37 °C. The double digested sample was then loaded 
onto a 1% agarose gel in TAE and the large 4166 bp Nde I-Eco47III fragment 
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separated from the 196 bp fragment electrophoretically. The large fragment was 
cut out of the gel, purified from agarose and precipitated as above. 

An oligonucleotide mix was prepared consisting of 100 pmoles each of 
oligos CAM32 and CAM33 (SEQ ID NOS;4 and 5, respectively) in a combined 
volume of 9.9 |iL. The two oligos complement each other to form a 3' blunt end 
corresponding to the 5' half of an Eco47 III restriction site and also form a 5' 
staggered end which corresponds to the 3' half of an Nde I restriction site. 

CAM32;(SEQlDNO:4) 

i'-TATGTCCAAGTTCGTAAGAAAGAATCCAAAGTCTGATAAATTCAAGGTTAAGC-j' 
CAM 33: (SEQ ID NO:5) 

S'-GCTTAACCTTGAATTTATCAGACTTTGGATTCTTTCTTACGAACTTGGACA-j' 

The oligo mix was heated to 90 °C for 1 .5 min and then allowed to cool to 
room temperature over 20 min. The dried nucleic acid pellet resulting from 
purification of the 4166 bp Nde I-Eco47 III fragment was solublized in 7 fiL of 
the cooled oligo mix and subsequently heated to 45 °C for 5 min followed by 
cooling on ice. Ligation of the oligos with the Nde I-Eco47 III fragment followed 
by transformation into DH5a was performed as above. Transformed bacterial 
cells were spread onto LB/carbenicillin plates and incubated at 37 °C overnight. 
Seventeen colonies were selected and processed to isolate plasmid DNA as above. 
A portion (5 out of 50 |iL) of each plasmid was double digested with 10 units each 
of Nde I and Hind III and the fragments separated electrophoretically on a 1% 
agarose gel in TBE. A two band pattern corresponding to insert (1 373 or 1518 bp) 
and vector (2844 bp) was detected. An additional double digest with 10 units 
each of Xba I and Hind III was performed on another 5 jiL aliquot of plasmids. 
When digested with Nde I and Hind III. none of the plasmids which contained the 
smaller insert size contained a Xba I site. The Xba I site would be eliminated if 
the two oligos replaced the 1 96 bp fragment originally present in pT7Blue+PD01 . 
The 7 plasmid samples with the modified /j-hydroxyphenylpyruvate dioxygenase 
insert were combined and designated pT7BlueR+PD02. 

The pT7BlueR+PD02 plasmid DNA was quantified spectrophotometrically 
(above) and then 5 fig was digested with 20 units each of Hind III and Nde I in 
62 (iL of React 2 for 2 h at 37 °C. The digested sample was subsequently loaded 
onto a 1% agarose gel in TAE and separated electrophoretically. The 1373 bp 
fragment was isolated and precipitated as above. The plasmid pET24c(-f) (5 fig) 
was double digested with 20 units each of both Nde I and Hind HI in React 2 at 
37 °C for 2 h and the 5245 bp fragment then gel purified on a 1% agarose gel in 
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TAE and subsequently separated from agarose and precipitated as above. The 
dried pET24c(+) pellet was solublized in 10 ^L TE and then 8 uL was adjusted to 
a 20 uL total volume with water, dephosphorylation buffer (Gibco BRL) and 
1 unit of calf intestinal alkaline phosphatase (Gibco BRL). The sample was 
5 incubated at 37 °C for 30 min and then gel purified, separated from agarose, and 
precipitated as above. The dried, dephosphorylated, pET24c(+) vector pellet and 
modified p-hydroxyphenylpyruvate dioxygenase insert pellet were each solublized 
in 10 TE and then 1 |aL of each was run on a 1% agarose TBE gel with 4 iiL of 
mass ladder to quantify DNA as above. One hundred nanograms of modified 
10 /?-hydroxyphenylpyruvate dioxygenase insert was mixed with 120 ng of 

dephosphorylated pET24c(+) vector in a total of 7 uL volume. The mix was 
heated to 45 °C for 5 min and then cooled on ice. The mix was then supplemented 
with T4 DNA ligase buffer and 1 unit of T4 DNA ligase in a total volume of 
10 and the mix allowed to incubate at room temperature for 4 h. The ligation 
1 5 mix was subsequently transformed into DH5a, spread on LB agar supplemented 
with 30 [xg/mL kanamycin, and incubated overnight at 37 °C. Plasmid 
preparations were performed on 1 1 colonies as above. Plasmids were double 
digested with Nde I and Hind III and fragments separated electrophdretically. All 
plasmids had the expected 1373 bp and 5245 bp fragments. One bacteria colony 
20 was selected and used to inoculate 100 mL of liquid LB supplemented with 

30 |4g/mL kanamycin which was subsequently incubated at 37 °C overnight with 
shaking. Plasmid DNA was isolated from the resulting bacteria culture using a : 
Qiagen Plasmid Midi Kit according to the manufacturer's instructions. A portion 
of the plasmid DNA (pE24CPl) was sequenced with the Sequenase Version 2.0 
25 DNA Sequencing Kit (United States Biochemical. Cleveland, OH) using a 

biotinylated sequencing primer to the T7 promoter (United State Biochemical) 
according to the manufacturer's instructions for non-radioactive manual 
sequencing. DNA was transferred from the sequencing gel to Hybond-N+ nylon 
transfer membrane (Amersham, Arlington Heights, IL) by capillary action. 
30 Transfer and all subsequent steps in chemiluminescent detection of DNA 

fragments were performed with a SEQ-Light Chemiluminescent Sequencing 
System kit (Tropix, Bedford, MA) according to the manufacturer s instructions. 
DNA sequencing verified that the plasmid contained the expected 5' sequence for 
the modified /7-hydroxyphenylpyruvate dioxygenase insert where nucleotides 1-95 
35 (Figure 2) were replaced with an ATG transcriptional start site. This is equivalent 
to amino acids 2-29 (Figure 3) being eliminated from the N-terminus of the 
Arabidopsis /?-hydroxyphenyipyruvate dioxygenase amino acid sequence. 
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The piasmid pE24CPl was transformed into competent cells of BL2HDE3) 
E. coli (Novagen), as above. Transformed cells were spread on LB/kanamycin 
plates and incubated overnight at 37 °C. Seven colonies were selected for piasmid 
preparations as above and piasmid DNA was double digested with Nde I and 
5 Hind III to verify that all plasrnids had the expected electrophoretic banding 
pattern. One colony was selected and streaked for isolation on LB/kanamycin 
plates. A well isolated colony was used to inoculate liquid LB supplemented with 
30 ^ig/mL kanamycin and the culture was incubated at 37 °C with shaking 
(250 rpm) until it reached an A 60 o of 0.6 absorbance units. An 8% glycerol 
10 freezer stock was prepared according to the Novagen protocol and stored at 

-80 °C. All subsequent expression studies were done with freshly grown bacterial 
cells that were isolated from LB/kanamycin plates streaked from the glycerol 
freezer stock. 

BL2HDE3) E. coli cells containing either pL24CPl or pET24c(>) (negative 

1 5 control) were streaked out onto LB/kanamycin plates from a glycerol freezer stock 
(above) and incubated overnight at 37 °C. One isolated colony was selected for 
inoculation of 2 mL of LB containing 30 jig/mL kanamycin in a 17 x 100 mm 
Falcon tube, and the culture was incubated at 37 °C with shaking (250 rpm) 
overnight. The overnight cultures were then used to inoculate 100 mL of fresh LB 

20 containing 30 |ag/mL kanamycin. The new cultures were incubated at 37 °C with 
shaking until the A^qq reached between 0.4 and 0.6 absorbance units. One half of 
the pE24CPl and pET24c(+) cultures were placed in new culture flasks and IPTG 
(isopropylthio-fJ-D-galactoside; Gibco BRL) was added to the new flasks to give a 
final concentration of 1 mM. The flasks were incubated an additional 3 h at 37 °C 

25 with shaking, and then the cells were harvested. 

The harvested cells were centrifuged and the resulting cell pellet extracted 
by sonication (3 x 10 sec bursts) in 2 mL extraction buffer (50 mM (20 mM in the 
first experiment; Table 2) potassium phosphate buffer, pH 7.2, containing 0.14 M 
KC1, 0.32 mM reduced glutathione, 1% polyvinylpolypyrrolidone. and 0.1% 

30 Triton X 100 (0.01% lysozyme was included in the first experiment only)). The 
lysate represents the crude extracted enzyme after centrifugation at 17000 g for 
10 min. In the first experiment (Table 2) a 20 to 60% ammonium sulfate 
precipitated enzyme fraction was also assayed. Solid ammonium sulfate was 
slowly added with stirring to 2 mL of the lysate to bring the concentration to 20% 

35 (w/v). After incubation on ice for approximately 15 min, the solution was 

centrifuged at 17000 g for 10 min. The supernatant liquid was harvested and solid 
ammonium sulfate was added to increase the concentration to 60% ( w/v). After 
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centrifugation. the resulting pellet was resuspended in 1 mL of the extraction 
buffer. 

A portion of the insoluble protein resulting from expression of Arabidopsis 
/?-hydroxyphenylpyruvate dioxygenase in bacteria was utilized for N-terminal 
5 sequence analysis. The protein (approximately 1 80 jig) was suspended in 60 \xL 
of extraction buffer and then diluted with 5 volumes of sample buffer (62.5 mM 
Tris, pH 6.8, 6 M urea, 160 mM dithiothreitol, 0.01% bromophenol blue) 
followed by intermittent vortexing for one hour at room temperature. A 1 .5 mm 
thick, 12% polyacrylamide resolving gel was prepared for a Mini-Protein II dual 
10 slab cell (Bio-Rad, Hercules, CA) using the manufacturer's instructions. The 
polyacrylamide was allowed to polymerize for 3 h and then a stacking gel was 
prepared using a preparative comb. The running buffer was prepared according to 
the manufacturer's instructions with the addition of 0. 1 mM sodium thioglycolate. 
The solublized protein sample was electrophoretically separated using the 
15 manufacturer's instructions. When the bromophenol blue dye front reached the 
bottom of the gel. the gel was removed and equilibrated for 5 min in blotting 
buffer (10 mM CAPS, pH 1 1, 10% methanol, balance water). The gel was then 
placed in a Mini Trans-Blot Electrophoretic Transfer Cell (Bio-Rad), according to 
the manufacturer's instructions, with a ProBlott PVDF membrane ( Applied 
20 Biosystems, Foster City. CA) treated according to the manufacturer's instruction. 
Electroblotting was done in the presence of blotting buffer at 50 volts for 45 min 
in an ice bath. The membrane was then rinsed in water and stained with 
Coomassie Blue as described in the ProBlott protocol. The major protein band 
was excised from the membrane and subjected to N-terminal amino acid 
25 sequencing on a Beckman (Fullerton. CA) LF3000 protein sequencer. The first 
1 1 cycles identified S-K-F-V-R-K-N-P-K-S-D (see SEQ ID NO:3. amino acids 
30-40). respectively. This is the expected N-terminus of the modified Arabidopsis 
p-hydroxyphenylpyruvate dioxygenase minus the initial methionine (amino acids 
30-40, Figure 3). 
30 EXAMPLE 3 

p-HvdroxvphenvIpvruvate Dioxygenase Enzvmatic Activity 
of the Plant Protein Expressed in E. Coli 
Cell cultures with different plasmid constructs were extracted as described 
above and assayed by measuring the formation of ,4 C0 2 from 
35 [l- I4 C]-p-hydroxyphenylpyruvate or l4 C0 2 and ,4 C-homogentisate from 
[U- ,4 C]-p-hydroxyphenylpyruvate (Lindblad, B., (1971) Clin. Chim. Acta 
34:1 13-121; and Lindstedt, S. and Odelhog, B., (1987) Methods in Enzymology 
142:143-148). The labeled substrate was prepared from [l- ,4 C]-L-tyrosine 
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(55 mCi/mmol; American Radiolabeled Chemicals, Inc.. St. Louis. MO) or 
[U- 14 C]-L-tyrosine (498 mCi/mmol: DuPont NEN, Boston, MA). A 50-100 uL 
aliquot (5-10 fiCi) of the of the labeled tyrosine stock solution was transferred to a 
4 mL glass vial and blown to dryness in a stream of nitrogen at 45°C. To the vial 
5 was added 175 fiL of 0.1 M phosphate buffer, pH 6.5, 5 jiL catalase (28.700 units 
of C-100, Sigma Chemical Co., St. Louis. MO), and 20 jiL L-amino acid oxidase 
(Sigma A-9253, 6.5 units/mL). The vial was then placed on a shaker water bath 
set at 30°C, 60 cycles/min. for 0.5 to 1 h. The reaction mix was then passed 
through a small column containing 400 fiL Dowex AG SOW X8 cation exchange 

1 0 resin. The column was then washed with 1.5 mL of water and the eluant 

containing the labeled p-hydroxyphenylpyruvate was collected. The labeled 
substrate was either used immediately or stored at -80°C and used within a week 
after preparation. 

The assay was performed in 14 mL culture tubes capped with serum 

15 stoppers through which a polypropylene well containing 200 [iL of I N KOH was 
suspended. The reaction mixture contained 5.740 units of catalase. 100 |iL of a 
freshly prepared 1:1 ( v:v) mixture of 150 mM reduced glutathione and 3 mM 
dichlorophenolindophenoh 5 mM ascorbate. 0.1 mM ferrous sulfate (the ascorbate 
and ferrous sulfate were not present in the buffer used in the first experiment; 

20 Table 2), 50 ]iM unlabeled p-hydroxyphenylpyruvato. 1-25 jaL of the enzyme 
extract, and 50 mM potassium phosphate buffer in a final volume of 980 (iL. 
Unlabeled substrate was made fresh daily in 50 mM potassium phosphate buffer 
and allowed to equilibrate for at least 2 h at room temperature to insure that 
greater than 95% was in the keto form. The tubes were incubated for 1 0 min at 

25 30°C in a shaking water bath prior to adding 20 }iL (0.04 jaCi) of 

14 C-/?-hydroxyphenylpyruvate. The reaction was terminated after 60 min by 
injecting 500 \xl of 1 N sulfuric acid through the serum stopper. The vials were 
left on the shaker for another 30 min to insure complete capture of the released 
,4 C02- The serum caps were then removed and the wells cut and dropped into 

30 8 mL scintillation vials. Six mL of Formula-989 scintillation fluid (Packard 

Insturments, Meriden, CT) was added to the vials and the l4 C radioactivity was 
determined by scintillation counting. Table 2 summarizes the results of this 
experiment. 
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Table 2 

/?-Hydroxyphenylpyruvate Dioxygenase Activity of Extracts from 
E. coli Containing Different Plasmid Constructs 





Inducer 


Lysate 


Ammonium Sulfate Precipitate 


Plasmid 


(\ rniVI IPTG) 


dpm * /ma 


nmol/min x mu 


dpm * /me 


nmol/min \ mn 


pET24c(+) 




12,318 


0.09 


0 


0.00 


pET24c(+) 


4- 


35,115 


0.25 


3,393 


0.03 


pE24CPI 




24,607 


0.17 


126,761 


0.89 


PE24CPI 


4- 


243,801 


1.71 


1,371,823 


9.64 



C : '^C - 1 : 50; sp. act. of ^C-p-hydroxyphenylpyruvate = 55 mCi/mmol 



The results show there was little or no /?-hydroxvphcnvl pyruvate 
dioxygenase activity in any of the cell cultures that did not have the plasmid 
containing the nucleic acid fragment encoding />hydroxyphenylpyruvate 
dioxygenase (pET24c(+)) and the inducer of gene expression (IPTG). The gene 
10 and inducer together resulted in a marked increase in activity. 

In the experiment with [U- 14 C] />hydroxyphenylpyruvate fi-IPPA"), where 
both 14 C(>> and i4 C-homogcntisic acid were measured, the reaction was initiated 
by adding 50 jiL of labeled substrate (0.3 ^iCi) and was terminated with 100 (aL of 
10% phosphoric acid. The 14 C0 2 released was determined by scintillation 

15 counting, while the level of homogentisic acid was determined by HPLC on a 
Zorbax RX-C8 column (4.6 x 250 mm) with an in-line radioactivity detector. 
Aliquots of 1 .7 to 15 jaL were taken from the reaction mix after ccntrifugation and 
diluted into the column equilibration buffer prior to injection. Separation was 
performed at ambient temperature with a flow rate of 1 .0 mL/min and the 

20 following gradient with solvent A and B being water and methanol, each with 1% 
phosphoric acid: 0-2 min, isocratic at 95% A and 5% B; 2-17 min. linear gradient 
from 95 to 75% A and 5 to 25% B; 17-19 min linear gradient from 75 to 5% A 
and 25 to 95% B; 19-22 min. isocratic at 5% A and 95% B; 22-24 min, linear 
gradient from 5% to 95% A and 95 to 5% B. In this system homogentisate eluted 

25 at 10.8 min. The results from this experiment arc shown in Table 3. 
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Table 3 

p-Hydroxyphenylpyruvate Dioxygenase Activity of Cell Extracts 

Determined by CChRelease and Hornogcntisic Acid Synthesis 

from [U- 14 C] 77-Hvdroxvphenylpyruvate 
5 ' 

Inducer nmol/min x mg* 

Plasmid (1 mM IPTG) l4 CO> Homogentisic acid 

pET24c(+) - 0.00 0.00 

pET24c(+) +- 0.19 0.00 

pE24CPl - 4.68 4.76 

pE24CPl + 29.12 29.82 

14 C . 12 C = { . g7 7. sp act of 14 C [ U ]. /? . [ JPPA = 498 mCi / mmo j 

There was a tight correlation between the results from the assays of the two 
products of the reaction. The results confirmed there was no significant 

10 /?-hydroxypheny [pyruvate dioxygenase activity in either cell culture that did not 
contain the nucleic acid fragment encoding p-hydroxyphenylpyruvate 
dioxygenase. There was mcasureable enzyme activity in the absence of the 
inducer, but when the inducer was added the activity increased greater than six- 
fold over uninduced cultures. These results and those of Table 2 clearly show that 

1 5 the nucleic acid fragment isolated and overexpressed in E. coli cells encodes a 
protein that catalyzes the conversion of /7-hydroxyphenylpyruvate to 
homogentisate with the release of C0 2 . 

The overexpressed protein was also assayed spectrophotometrically at 
ambient temperature using the cnol borate-tautomerase assay (Lin. E, C. C. et al., 

20 (1958)7. Biol. Chem. 233:668-673). The assay buffer contained 0.4 M borate 

(adjusted to pH 7.2 with 0.2 M sodium borate), 4 mM ascorbate. 2.5 mM EDTA, 
40 nM />hydroxyphenylpyruvate, and 0.5 units of tautomerase (Sigma T-6004) 
per 10 mL buffer. The reaction mix was used when the tautomerization of the 
substrate was complete (when absorbance at 308 nm had stabilized). The assay 

25 was initiated by adding 40 \xL of the cell extracts to 960 yxL of the assay buffer, 

and the reaction was followed by measuring the decrease in absorbance at 308 nm. 
Table 4 summarizes the results with extracts of the same four cell cultures 
described in Table 3. 
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Table 4 

Spectrophotometric Assay of /?-HydroxyphenyIpyruvate 
Dioxygenase Activity of Cell Extracts 



Plasrnid 


Inducer 
(1 mM IPTG) 


nmol p-HP lost/min x me* 


pET24c(+) 




1.58 


pET24c(+) 


+ 


2.73 


pE24CPl 




4.91 


pE24CPl 


+ 


22.32 



5 * Loss of /3-hydroxyphenylpyruvate based on a molar extinction 

coefficient for the equilibrium mixture of 9850 as reported by 

Lin et al. ((1958)7. Biol. Chem. 233: 668-673). 

EXAMPLE 4 

10 Inhibition of ^-Hvdroxvphenvlnvruvate Dioxvgenase hv Commercial Herbicides 
The enzymatic activity of the overexpressed protein is inhibited by two 
herbicides known to inhibit plant /?-hydroxyphenylpyruvate dioxygenase: 
Sulcotrione (2-(2-chloro-4-methanesulfonylbenzoyl)- 1 .3-cyclohexanedione); and 
Isoxaflutole (5-cyclopropylisoxazol-4-vI 2-mesyl-4-trifluoromethylphenyl 

1 5 ketone). These two compounds were tested against the overexpressed protein 
using both the l4 C02 and the continuous spectrophotometric enol borate- 
tautomerase assays. Both compounds were added to the assay buffers in 10 jaL of 
acetone or dimethyl sulfoxide. The I 50 values (concentration inhibiting the 
enzyme 50%) were calculated based on the percent inhibition observed over " 

20 several concentrations of the inhibitor. The results of the assays are shown in 
Table 5. 

Table 5 

I 50 Values of Inhibitors of Plant /?-Hvdroxyphenvlpyruvate Dioxvgenase 

25 





I50 value (n 


M) derived from 


Compound 


l4 CO-> assav 


spectrophotometric assav 


sulcotrione 


43 


44 


isoxaflutole 


409 


1042 



These results clearly show that the /?-hydroxyphenylpyruvate dioxygenase 
activity of the overexpressed protein is inhibited by commercial herbicides that 
have inhibition of this enzyme as their mode of action. Moreover, the continuous 
30 spectrophotometric assay gave similar I 50 values to those obtained with the 14 CO-> 
assay. The spectrophotometric assay can be adapted to a high capacity screen for 
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inhibitors of /?-hydroxyphenylpyruvate dioxygenase by adapting it to a microtiter 
plate assay combined with a plate reader that would read at or near 308 nm. 
Furthermore, any colorimetric or fluorescent assay for homogentisate or 
/?-hydroxyphenylpyruvate would also be able to be readily adapted into a high 
5 capacity screen for inhibitors of this enzyme. The isolated overexpressed enzyme 
has sufficient activity to be used directly in a spectrophotometric assay or it can be 
further purified for enhanced assay sensitivity. 

EXAMPLE 5 

Re-construction of the Full-length p-Hvdroxyphenvlpvruvate Dioxygenase Gene 

10 for Production of Active. Stable Enzvme in Bacteria 

The plasmid pT7BlueR+PD02, described in Example 2 and containing the 
full-length /7-hydroxyphenyIpyruvate dioxygenase gene, proved to have incorrect 
sequence at the EcoRl site. This was re-sequenced so that an oligonucleotide 
could be designed to replace the EcoRJ site with an Ndel site using conventional 

1 5 loop-out mutagenesis. The oligonucleotide was designed so that this procedure 
also introduced an ATG initiation codon at the 5 f - end of the />-hydroxyphenyl- 
pyruvate dioxygenase gene followed by the full-length /7-hydroxyphenylpyruvate 
dioxygenase sequence. After mutagenesis, the clone was amplified in E. coli and 
the plasmid was purified. The resulting full-length gene, "PDO-B , \ was then 

20 digested with the enzymes using Ndel and Nhel, and the -820 bp fragment used to 
replace the Ndel - Nhc I segment of the truncated /^-hydroxyphenylpyruvate 
dioxygenase gene, t4 PDO-A," in pE24CPl (Example 1). The resulting plasmid, 
pE24PDO-B can be expressed in bacteria to produce the full-length Arabidopsis 
/;-hydroxyphenylpyruvate dioxygenase enzyme as determined by enzyme activity 

25 and N-terminal sequence analysis. 

EXAMPLE 6 

Enhanced Stability of Full Length Construct Over the Truncated Construct 
Two different constructs for Arabidopsis thaliana /?-hydroxyphenyl- 
pyruvate dioxygenase, one containing the full-length sequence. PDO-B as 

30 described in Example 5 and produced from plasmid pE24PDO-B, and one 
containing the truncated sequence lacking the putative chloroplast leader 
sequence. PDO-A as produced from plasmid pE24CPl, were both purified to the 
same extent using a Pharmacia phenyl Sepharose hydrophobic interaction column 
followed by gel filtration chromatography on Pharmacia Sephacryl 300. The two 

35 proteins were diluted to 1 mg/mL in 20 mM bis tris-propane buffer. pH 7.2 
containing 5 mM ascorbate. 1 mM reduced glutathione and 0.1 mM ferrous 
ammonium sulfate and stored in a refrigerator at 4 °C for up to 10 days. Aliquots 
were removed at various times and assayed for activity using the tautomerase 
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coupled spectrophotomctric assay. Under these conditions the half-life for the 
activity of the full length enzyme was 4 days, whereas the truncated enzyme 
preparation had a half-life of 9 to 10 hours. In addition, the activity of the full 
length enzyme could be restored by incubation with iron and reducing agent. 
5 reduced glutathione or ascorbate, or by dialysis against buffer containing iron and 
reducing agent. In contrast, the activity of the truncated enzyme could not be 
restored by incubation with or dialysis against buffer containing iron and reducing 
agent. The full-length enzyme was also more stable in the spectrophotometry 
assay showing a 2 to 3 times longer useful linear region than the truncated 

10 enzyme. Both enzyme preparations showed similar I 50 values with the 
herbicidally active inhibitors. 

These results clearly show that the full-length PDO-B construct has 
decided advantages over the truncated enzyme due to the enhanced stability under 
storage conditions, in the spectrophotomctric assay and in the reversible 

1 5 reconstitution of activity in the presence of iron and reducing agent. While both 
enzyme constructs can be used for screening of inhibitors, the PDO-B enzyme is 
preferred for this application and is far superior for mechanistic and structural 
studies. 

EXAMPLE 7 

20 Cloning of the Maize p-Hvdroxvphenvlpvruvate Dioxvuenase Gene 

Approximately 600,000 plaques of a Stratagene maize Uni-Zap cDNA. 
library (from young plants) were screened by filter hybridization under moderate 
stringency using a heterologous probe. The probe was prepared by PCR and was 
a 916 bp fragment of DNA having the sequence defined by the region extending 

25 from position 263 to 1 1 78 of SEQ ID NO: 1 4. Twenty-four positive phage clones 
were identified in the primary screen, and eleven phage clones were recovered 
from a secondary screen. Seven positive clones were submitted for sequencing, 
and four showed significant conservation sequence at the amino acid level when 
compared with the Arabidopsis thaliana />hydroxyphenylpyruvate dioxygenase 

30 protein. The longest of the four contained an insert of 988 bp and showed 70% 
identity and 78% similarity with the Arabidopsis protein, but was lacking 
approximately 550 bp corresponding to the amino terminal end of the protein. 

Attempts to obtain a full-length cDNA of the maize p-hydroxyphenyl- 
pyruvate dioxygenase gene were unsuccessful, possibly because the secondary 

35 structure of the RNA inhibited efficient reverse transcription of this transcript. 

Two additional cDNA libraries were screened and clones long enough to contain a 
full-length cDNA were sequenced. All of these clones were shown to be 
chimeras. Therefore a genomic library was screened to obtain the 5 1 one-third of 
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the gene. Approximately I million clones from a Clontech Zca mays (var. B73) 
library in the phage vector EMBL3 (whole seedlings, 2 leaf stage) were screened 
using a 415 bp EcoRI-BssHII fragment containing the 5' end of the truncated corn 
/?-hydroxyphenylpyruvate dioxygenase cDNA (clone H101 1C). Eight positive 
5 primary' phage clones were plated and screened, and four secondary clones were 
picked. DNA was prepared from each using the Qiagen Lambda midi-kit. 
Restriction digests with Sail or EcoRl indicated that two clones were the same. 
DNA samples from the remaining 3 clones (1 1.1.3, 13.1.1, and 21.2.1) were 
digested with Sail, EcoRl, or Sail and EcoRl, prepared for Southern analysis, and 

1 0 probed with the full length Arahidopsis /?-hydroxypheny [pyruvate dioxygenase 
gene. Two of the clones (11.1.3 and 13.1.1) showed sequence conservation, and 
these homologous fragments were subcloned and sequenced. Both clones 
appeared to contain the full-length gene and each contained one intron near the 3' 
end of the gene. However, there were differences between the sequences of the 

I 5 two clones indicating that they may be two different genes or one may be a 

pseudogene. The sequence of clone 1 1.1.3 matched the cDNA sequence, and this 
clone was used to construct a full length /;-hydroxyphenylpyruvate dioxygenase 
coding region. 

The gene was contained on two adjacent fragments, a 3.5 kb EcoRl - Sail 

20 fragment and a 2 kb Sail fragment. Both were subcloned into pBluescript SKI1+ 
resulting in the plasmids pESl 1 13 and pSall 1113. pESl 1 13 was digested with 
Spel to release approximately 2.7 kb of upstream sequence and then religated, 
resulting in a plasmid with an insert of 747 base pairs (pSPEl). pSPEl was 
digested with Sail to linearize the plasmid and ligated with the 2 kb Sail fragment 

25 from pSall 113, which had been released by digestion with Sail and gel purified. 
Orientation was confirmed by digestion with Spel and Bpul 1021 and the correct 
plasmid was named pi 1 13. In order to remove the intron contained in the 3' end 
of the genomic clone, the plasmid was digested with Bpul 1021 and Xhol and the 
3.9 kb fragment containing the vector and 5' part of the gene was gel purified. 

30 The corresponding 882 bp Bpul 1021-XhoI fragment from pHlOl lc (cDNA)was 
gel purified and ligated with this 3.9 kb fragment resulting in the clone pMPDO 
(ATCC 209120). which contains a 1782 bp insert. There are 260 base pairs 
upstream of the putative ATG and 189 base pairs downstream of the stop codon. 
The full-length sequence was confirmed by sequencing across the insert. The 

35 nucleic acid sequence and the deduced protein sequence for corn 

/?-hydroxyphenylpyruvate dioxygenase are presented in SEQ ID NOS:10 and 1 L 
respectively. The sequences for/?-hydroxyphenylpyruvate dioxygenases obtained 
from corn and Arahidopsis were compared using the "Gap" program of GCG 
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(Program Manual for the Wisconsin Package, Version 9.0-OpenVMS, December 
1996, Genetics Computer Group. 575 Science Drive, Madison. WI, USA 5371 1). 
The results of these comparisons indicated that these functions are approximately 
67% identical at the nucleotide level, and they possess 69% similarity and 62% 
5 identity at the amino acid level. The predicted amino acid sequence of corn 
/?-hydroxypheny!pyruvate dioxygenase is compared with that from Arahidopsis 
and other eukaryotes in Figure 3. 

EXAMPLE 8 

Composition of a cDNA Library; Isolation and Sequencing of cDNA Clones 
10 A cDNA library representing mRNAs from developing seeds of Vernonia 

galamenensis that had just begun production of vernolic acid was prepared. The 
library was prepared in a Uni-ZAP™ XR vector according to the manufacturer's 
protocol (Stratagene Cloning Systems, La Jolla, CA). Conversion of the 
Uni-ZAP™ XR library into a plasmid library was accomplished according to the 
1 5 protocol provided by Stratagene. Upon conversion. cDNA inserts were contained 
in the plasmid vector pBluescript. cDNA inserts from randomly picked bacterial 
colonies containing recombinant pBluescript plasmids were amplified via 
polymerase chain reaction using primers specific for vector sequences flanking 
the inserted cDNA sequences. Amplified insert DNAs were sequenced in dye- 
20 primer sequencing reactions to generate partial cDNA sequences (expressed 

sequence tags or "ESTs"; see Adams, M. D. et aL ( 1 99 1 ) Science 252:1651). The 
resulting ESTs were analyzed using a Perkin Elmer Model 377 fluorescent 
sequencer. 

EXAMPLE 9 * 
25 Identification and Characterization of cDNA Clones 

ESTs encoding Vernonia galamenensis enzymes were identified by 
conducting BLAST (Basic Local Alignment Search Tool; AltschuK S. F. et aL. 
(1993) J. Moi BioL 215:403-410; see also vvaw.ncbi.nlm.nih.gov/BLAST/) 
searches for similarity to sequences contained in the BLAST "nr ' database 
30 (comprising all non-redundant GenBank CDS translations, sequences derived 

from the 3-dimensional structure Brookhaven Protein Data Bank, the last major 
release of the SWISS-PROT protein sequence database. EMBL. and DDBJ 
databases). The cDNA sequences obtained in Example 9 were analyzed for 
similarity to all publicly available DNA sequences contained in the "nr* database 
35 using the BLASTN algorithm provided by the National Center for Biotechnology 
Information (NCBI). The DNA sequences were translated in all reading frames 
and compared for similarity to all publicly available protein sequences contained 
in the "nr" database using the BLASTX algorithm (Gish. W. and States, D. J. 
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(1993) Nature Genetics 3:266-272) provided by the NCBI. For convenience, the 
P-value (probability) of observing a match of a cDN A sequence to a sequence 
contained in the searched databases merely by chance as calculated by BLAST are 
reported herein as "pLog" values, which represent the negative of the logarithm of 
5 the reported P-value. Accordingly, the greater the pLog value, the greater the 
likelihood that the cDNA sequence and the BLAST "hit" represent homologous 
proteins. 

The BLASTX search using clone vsl.pk0015.b2 revealed similarity of the 
protein encoded by the cDNA to a number of />-hydroxyphenylpyruvate 

10 dioxygenases from sources other that plants. The three most similar p-hydroxy- 
phenylpyruvate dioxygenase proteins were a streptomycete />hydroxyphenyl- 
pyruvate dioxygenase (GenBank Accession No. Ul 1864; pLog = 8.34), a rat 
p-hydroxyphenyipyruvatc dioxygenase (GenBank Accession No. Ml 8405; 
pLog = 7.66), and a human p-hydroxyphenyipyruvatc dioxygenase (GenBank 

1 5 Accession No. U29895: pLog = 7.60). SEQ ID NO: 1 6 shows the nucleotide 
1 sequence of a portion of the Vcrnankx galamenensis cDNA in clone 

vsl.pk0015.b2. Sequence alignments and BLAST scores and probabilities 
indicate that the instant nucleic acid fragment encodes a portion of Yernoma 
galamenensis /?-hydroxyphenylpyruvate dioxygenase. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT : 

(A) NAME: E . I. DUPONT DE NEMOURS AND COMPANY 
(3) STREET: 1007 MARKET STREET 

(C) CITY: WILMINGTON 

(D) STATE: DELAWARE 

(E) COUNTRY: U.S.A. 

<F) POSTAL CODE [Z1V\ : 193 98 

(G) TELEPHONE: 302-&92-S112 

(H) TELEFAX: 302 -77 3-01 G-l 

(I) TELEX: 6717325 

(ii) TITLE OF INVENTION; PLANT GENE FOR p-HYDROXY- 

P ii E N Y L ? Y R U VAT E 01 OX YGENAS E 

(iii) NUMBER OF SEQUENCES: 16 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: DISKETTE, -.50 INCH 

(B) COMPUTER ; IBM PC COMPATIBLE 

(C) OPERATING SYSTEM: MICROSOFT WORD FOR WINDOWS } r - 

(D) SOFTWARE: MICROSOFT WORD VERS ION 7 . OA 

(v) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vi? PRIOR APPLICATION DATA: 

(A) APPLICATION NUM3ER: 6C/02i,3G<l 

(B) FILING DATE: TUNE 27, 1996 

(vii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: FLOYD, LINDA AXAMETHY 
REGISTRATION NUMBER : .33,692 
id REFERENCE/ DOCKET NUMBER: BA-9120 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 233 base cairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

CAAGAAACGN GTCGNCGACG TGCTCAGCGA TGATCAGATC AAGGAGTGTG AGGAATTAGG GO 

GATTCTTNTA CACAGAGATG ATCAAGGGAC GTTNCTTCAA A7CTNCACAA AACCACTAGG 120 

TGACAGGCCG ACGNTATTTA TAGAGATAAT CCAGAGNGTA GGATGCATGA TGAAAGATGT 180 

GGAAGGGANG GCTTACCAGA GTGGAGNATN TNGTGGTTTT GGC AAA GGC A ATT 2 33 

(2 J INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

:A! LENGTH: 14 48 base pairs 
•;B) TYPE: nucleic acic' 
;C) STRANDEDNESS: single 
■D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

■;A) NAME /KEY: CDS 

(3) LOCATION: 9 . . 134 3 

(xi) SEQUENCE DESCRIPTION: SEQ I 0 NO : 2 : 

TGAAATCA ATG GGC CAC CAA AAC GCC GCC GTT TCA GAG AAT CAA AAC CAT SO 
Met Gly His Gin Asn Ala Ala Val Ser Glu Asn Gin Asn His 
1 5 10 

GAT GAC GGC GCT CCG TCG TCG CCG GGA TTC AAG CTC GTC GGA TTT TCC 9 5 

Asp Asp Gly Ala Ala Ser Ser Pro Gly Phe Lvs Lou Val GIv Phe Ser 

20 25 ' 30 

AAG TTC GTA AGA AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT AAG CGC { 4 G 

Lys Phe Val Arg Lys Asn Pro Lys Ser Asp Lvs Phe Lye Val Lys Arg 
3 3 4 0* 45 

TTC CAT CAC ATC GAG TTC TGG TGC GGG GAC GCA ACC AAC CTC GCT CGT 194 
Phe His His lie Glu Phe Trp Cys Glv Asp Ala Thr Asn Val Ala Arg 
50 55 60 

CGC TTC TCC TGG GGT CTG GGG ATG AGA TTC TCC GCC AAA TCC GAT CTT 24 2 

Arg Phe Ser Trp Gly Leu Gly Met Arg Phe Ser Ala Lys Ser Asp Leu 
65 70 75 

TCC ACC GGA AAC ATG GTT CAC GCC TCT TAC CTA CTC ACC TCC GGT GAA 2 90 

Ser Thr Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser Gly Glu 
80 85 " 90 

CTC CGA TTC CTT TTC ACT GCT CCT TAC TCT CCG TCT CTC TCC GGC GGA 3 38 

Leu Arg Phe Leu Phe Thr Ala Pro Tyr Ser Pro Sor Leu Ser Gly Gly 
35 100 105 ' 110 
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GAG ATT AAA CCG ACA ACC ACA GGT TCT ATC CCA AGT TTC GAT CAC GGG 38 6 

Glu He Lys Pro Thr Thr Thr Gly Ser lie Pro Ser Phe Asp His Gly 
115 120 125 

TCT TGT CGG TCC TTC TTC TCT TCA CAT GGT CTC GGT GTT AGA CCC GTT 4 34 

Ser Cys Ara Ser Phe Phe Ser Ser His Gly Leu Gly Val Arg Pro Val 
130 135 140 

GCG ATT GAA GTA GAA GAC GCG GAG TCA GCT TTC TCC ATC AGT GTA GCT 4 82 

Ala lie Glu Val Glu Asp Ala Glu Ser Ala Phe Ser He Ser Vai Ala 
145 150 155 



AAT GGC GCT ATT CCT TCG 

Asn Glv Aia lie Pro Ser 
160 ■ 

ACG ATC GCT GAG GTT AAA 

Thr lie Ala Glu Val Lys 

175 180 

AGT TAC AAA GCA GAA GAT 
Ser Tyr Lys Ala Glu Asp 
195 



TCG CCT CCT ATC GTC CTC 
Ser Pro Pro He Val Leu 
165 170 

CTA TAC GGC GAT GTT GTT 
Leu Tyr Gly Asp Val Val 
185 

ACC GAA AAA TCC GAA TTC 
Thr Glu Lys Ser Glu Pne 
200 



AAT GAA GCA GTT 5 30 

Asn Glu Ala Val 



CTC CGA TAT GTT 57 8 

Leu Arg Tyr Val 
190 

TTG CCA GGG TTC 62 6 

Leu Pro Glv Phe 
205 



GAG C37 GTA GAG GAT GCG TCG TCG TTC CCA TTG GAT TAT GGT ATC CGG 67 4 

Glu Aro Val Glu Asd Aia Ser Ser Phe Pro Leu Asp Tvr Glv He Arc 

210 * 215 ' 220 

CGG CTT GAC CAC GCC GTG GGA AAC GTT CCT GAG CTT GGT CCG GCT TTA 7 22 

Arg Leu Asp His Ala Val Gly Asn Val Pro Glu Leu Glv Pro Aia Leu 

225 230 235 



ACT TAT GTA GCG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG TTC ACA 77 0 

Thr Tyr Val Aia Gly Phe Thr Gly Phe Mrs Gin Phe Ala Glu Phe Thr 
240 245 250 



GCA GAC GAC GTT GGA ACC GCC GAG AGC GGT TTA AAT TCA GCG, GTC CTG 8 IS 

Ala Asp Asp Val Gly Thr Ala Glu Ser Gly Leu Asn Ser Ala Val Leu 
255 260 265 270 

GCT AGC AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CCA GTG CAC 8 66 

Ala Ser Asn Asd Glu Met Val Leu Leu Pro lie Asn Glu Pro Vai His 

275 2S0 285 

GGA ACA AAG AGG AAG AGT CAG ATT CAG ACG TAT TTG GAA CAT AAC GAA 914 

Gly Thr Lys Arg Lys Ser Gin He Gin Thr Tyr Leu Glu His Asn Glu 

290 295 300 



GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA GAC ATA TTC AGG 9 62 

Gly Aia Gly Leu Gin His Leu Ala Leu Met Ser Glu Asp lie Phe Ara 
305 310 315 



ACC CTG AGA GAG ATG AGG AAG AGG AGC AGT ATT GGA GGA TTC GAC TTC 1010 

Thr Leu Arg Glu Met Arg Lys Arg Ser Ser lie Gly Gly Phe Asp Phe 
320 325 330 

ATG CCT TCT CCT CCG CCT ACT TAC TAC CAG AAT CTC AAG AAA CGG GTC 1058 

Met Pro Ser Pro Pro Pre Thr Tyr Tyr Gin Asn Leu "Lys Lvs Arg Val 

335 340 345 ' 350 



GGC GAC GTG CTC AGC GAT GAT CAG ATC AAG GAG TGT GAG GAA TTA GGG 1106 
Gly Asd Val Leu Ser Asp Aso Gin He Lys Glu Cys Glu Glu Leu Gly 
355 360 365 
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ATT CTT GTA GAC AGA GAT GAT CAA GGG ACG TTG CTT CAA ATC TTC AC A 1154 
lie Leu Val Asp Arg Asp Asp Gin Gly Thr Leu Leu Gin lie Phe Thr 
370 375 330 

AAA CCA CTA GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC CAG AGA 1202 
Lys Pro Leu Gly Asp Arg Pro Thr lie Phe lie Giu lie lie Gin Ara 
385 390 395 

GTA GGA TGC ATG ATG AAA GAT GAG GAA GGG AAG GCT TAG CAG AGT 3GA 12 50 
Val Gly Cys Met Met Lys Asp Glu Glu Gly Lys Ala Tyr Gin Ser Gly 
400 405 ' 410 

GGA TGT GGT GGT TTT GCC AAA GGC AAT TTC TCT GAG CTC TTC AAG 7CC 1298 
Gly Cys Glv Gly Phe Ala Lys Giv Asn Phe Ser Giu Leu Phe Ly^ Ser 
415 420 425 430 

ATT GAA GAA TAG GAA AAG ACT CTT GAA GCC AAA CAG TTA GTG GGA 134 3 

lie Giu Glu Tyr Glu Lys Thr Leu Glu Ala Lys Gin Leu Val Gly 
4 35 4 40 44 e 

TGAACAAGAA GAAGAACCAA CTAAAGGATT GTGTAATTAA TGTAAAACTG 7TTT.-.7CTTA 14 03 

TCAAAACAAT GTATACAACA TCTCATTTAA AA ACG AG ATC AATCC 14 4 6 

(2) INFORMATION FOR 3EQ ID NO : 3 : 

(i) SEQUENCE: CHARACTERISTICS : 

(A) LENGTH: 4 4 5 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: Linear 

<ii) MOLECULE TYPS: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Me t Gly His Gin Asn Ala Ala Val Ser G i u As n Gin As n Hi; A so Asp 
1 5 10 15 

Gly Aia Ala Ser Ser Pro Gly Phe Lvs Leu Vai Giv Phe Ser Lvs Phe 
2 0 2 b 

Val Arg Lys Asn Pre; Lvs Ser Aso Lys Phe Lvs Vai i.yb A:: Phe His 
3 5 ' 4 0 " 4b 

His lie Glu Phe Trp Cys Giy Asp Aid Thr Asn Val Aia Arc Arc Phe 
50 5 5 60 

Ser Tro Gly Leu Gly Met Arg Phe Ser Ala Lvs Ser Aso Leu Ser Thr 
65 70 75 30 

Gly Asn Met Val His Ala Ser Tvr Leu Leu Thr Ser Giy Giu Leu Arc 
8 5 90 95 

Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser Gly Giv Glu He 
100 ^ 105 .110 

Lys Pro Thr Thr Thr Gly Ser lie Pro Ser Phe Asp His Gly Ser Cys 
115 120 125 

Arg Ser Phe Phe Ser Ser His Giv Leu Giv Val Arq Pro Vai Ala lie 
1 30 135 " " 1*10 

Glu Val Giu Asd Ala Glu Ser Ala Phe Ser lie Ser Vai Aia Asn Giy 
115 " 150 155 160 
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Ala lie Pro Ser Scr Fro Pro lie Val Leu Asn Glu Ala Va ]. Thr 1 1 e 
165 170 175 

Ala Giu Val Lys Leu Tyr Gly Asp Veil Vd Leu Arg Tvr Val Ser Tyr 
18 0 18 5 190 

Lys Ala Glu Asp Thr Glu Lys Ser Glu Phe Leu Pro Giv Phe Glu Arq 
195 200 205 

Val Glu Asp Ala Ser Ser Phe Pro Leu Asp Tyr Gly He Arq Arq Leu 
210 215 " 220 

Asp His Ala Val Gly Asn Val Pro Glu Leu Gly Pro Ala Leu Thr Tvr 
225 . 230 235 240 

Val Ala Gly Phe Thr Gly Phe His Glr: Phe- Ala Glu Phe Thr Ala Asp 
245 250 255 

Asp Val Gly Thr Ala Glu Ser Giy Leu Asn Ser Ala val Leu Ala Ser 
" 260 265 270 

Asn Asp Glu Met. Val Leu Leu Pro lie Asn Giu Pro Val His Glv Thr 
275 280 265 

Lys Arq Lys Ser Gin He Gin Thr Tvr Leu Glu Wis Asn Glu Glv Ala 
2 90 2 95 300 

Giy Leu Gin His Leu Aia Leu Met: :Hr Glu Asp lie Phe Arg Thr Leu 
305 310 313 32C 

Arg Glu Met Arg Lvs Ara Ser Ser Lie Giv Glv phe Asp Phe Met Pro 
325 330 ^ 335 

Ser Pro Pro Pro Thr Tvr Tyr Gin Asn Leu Lys Lvs Arq Val Giy Asp 
340 34 5 * ' 350 

Va 1 Leu Ser As p Asp Gin He Lys G i u C y s Giu Glu Leu Gly lie Leu 
355 360 365 

Val Asp Arq Asp Asp Glr. Gly Thr Leu Leu GJ n He Phe Thr Lvs Pre 
370 37 5 330 

Leu Giy Aso Arq Pro Thr lie The Ho Giu He He Gin Ara Val GJ w 
385 390 393 " 400 

Cys Met Met Lys Asp Glu Glu GJ y Lys Ala Tyr GJ n Ser Gly Giy Cys 
4 05 "4 10 4 15 

Gly Gly Phe Aia Lys Giy Asn Phe Ser Glu Leu Phe Lvs Ser He Glu 
420 425 ' 430 

Glu Tyr Giu Lys Thr Leu Glu Aia Lys Gin Leu Val Giy 
4 35 4 40 4 45 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 53 base pairs 
■(B) TYPE: nucleic acid 
(CJ STRAWDEDNESS: single 
(0) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DMA (qenomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

TATGTCCAAG TTCGTAAGAA AGAATCCAAA CTCTGATAAA TTCAAGGTTA AGC 5 3 

(2) INFORMATION FOR SEO ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 
(B; TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: DNA {qenomicj 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

GCTTAACCTT GAATTTATCA GACTTTGGAT TCTTTCTTAC GAACTTGGAC A 51 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

{ A } LENGTH: 392 amino acids 
{&) TYPE: amino acid 
iC\ STRANDEDNESS: sir.qii 
(C: TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Thr Ser Tvr Ser Asp Lvs Gly Glu I.ys Pro Glu Arq Gly Arq Phe Leu 
1 5 iO 15 

His Phe His Ser Val Thr Pne Trp Vol C I y Asn Ala Lys Glr. Ala Ala 
20 25 30 

Ser Tyr Tyr Cys Ser Lys lie Gly Phe Glu Pro Leu Ala Tyr I.ys Gly 
3 5 4 0 <\b 

Leu Glu Thr Gly Ser Arg Glu Val Vai Ser His Vai Vai Lys Gin Asp 
50 5 5 GO 

Lys lie Val Phe Val Phe Sor Ser Ala Leu Asn Trp Asn Lys Glu 

65 70 73 SO 

Met Gly Asp His Leu Val Lys His Gly Asp Gly Val Lys Asp lie Ala 
8 5 9 0* 9 5 

Phe Glu Val Glu Asp Cys Asd Tyr lie Vai Gin Lys Ala Arc Glu Arq 
100 * 105 110 

Gly Ala lie lie Vai Arg Glu Glu Vai Cys Cys Ala Ala Asp Val Arq 
115 120 125 

Gly His His Thr Pro Leu Asp Arg Ala Arq Gin Val Trp Glu Gly Thr 
130 135 140 

Leu Vai Glu Lys Met Thr Phe Cys Leu Asp Ser Arq Pro Gin Pro Ser 
145 150 155 160 

Gin Thr Leu Leu His Arg Leu Leu Leu Ser Lys Lou Pro Lys Cys Gly 
165 ' 170 i"5 

Leu Glu lie lie Asd His lie Val Gly Asn Gin Pro Asp Gin Glu Met 
180 * 185 190 
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Glu Ser Ala Ser Gin Trp Tyr Met Ara Asn Leu Gin Phe His Arg Phe 
195 200 ' 205 

Trp Ser Val Asp Asd Thr Gin He His Thr Glu Tvr Ser Ala Leu Arg 
210 215 220 

Ser Val Val Met Ala Asn Tyr Glu Glu Ser lie Lvs Me: Pro He Asn 
225 230 235 ' 240 

Glu Pro Ala Pro Giy Lys Lys Lys Ser Gin lie Gin Glu Tvr Val Asp 
245 250 ' 255 

Tyr Asn Cly Giy Ala Gly Vai Gin His He Ala Leu Lvs Thr Glu Asd 
260 265 * 270 

He lie Thr Ala lie Arg Ser Leu Ara Glu Arg GIv Vai Giu Phe Leu 
275 280 ' " 285 

Ala Val Pro Phe Thr Tyr Tvr Lvs Gin Leu Gin Glu Lvs Leu Lvs Ser 
29C 295 300 

Ala Lys lie Arg Val Lvs Glu Ser lie Asp Val Leu G " u Glu Leu Ly^ 
305 310 ' 3X5 320 

lie Leu Vai Asp Tyr Asp Glu Lys GIv Tvr Leu Leu Glr. He Phe Thr 
325 ' 330 335 

Lys Pro Met Gin Asp Arg Pro Thr Val Phe Leu Glu Val He Gin Arg 
340 345 350 

Asn Asn His Gin Giy Phe Giy Ala Giv Asn Phe Asn Ser Leu Phe Lys 
355 360 365 

Ala Phe Glu Glu Glu Gin Giu Leu Arg Giv Asn Leu Thr Aso Thr Asp 
370 375 " 330 

Pro Asn Gly Val Pro Phe Arg Leu 
335 390 

(2) INFORMATION FOR SEQ ID NO : 7 : 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 392 amino acids 

(B) TYPE : ammo acid 

(C) STRANDEDNESS : sinale 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xii SEQUENCE DESCRI PTIOM : SEQ ID NO : 7 : 

Thr Ser Tyr Ser Asp Lys Giy Glu Lys Pro Glu Arg GIv Arq Phe Leu 
1 5 10 15 

His Phe His Ser Val Thr Phe Trp Val Gly Asn Ala Lys Gin Ala Ala 
20 25 30 

Ser Tyr Tyr Cys Ser Lys He Gly Phe Glu Pro Leu Ala Tyr Lys Giy 
35 40 45 

Leu Glu Thr Gly Ser Arg Giu Val Vai Ser His Val Val Lvs G<n Asp 
50 55 60 

Lys He Val Phe Val Phe Ser Ser Ala Leu Asn Pre Tro Asn Lys Giu 
65 70 75 80 
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Met Gly Asp His Leu Vai Lys His Gly Asd Giv Val Lvs Aso lie Ala 
85 ' 90* 95 

Phe Glu Val Giu Asp Cys Asp Tyr lie Vai Gin Lys Ala Arg Glu Arq 
100 " 105 110 

Gly Ala lie lie Val Ara Glu Glu Val Cys Cys Ala Ala Aso Vai Arg 
115 120 125 

Gly His His Thr Pro Leu Asd Arg Ala Arg Gin Vai Tro Glu Glv Thr 
130 135 140 

Leu Val Glu Lys Met Thr Phe Cys Leu Asd Ser Arq Pro Gin Pro Ser 
145 150 " 155 ' 160 

Gin Thr Leu Leu His Arg Leu Leu Leu Ser Lys Leu Pro Lvs Cys Gly 
165 " 170 ' 175 

Leu Glu lie He Asp His lie Vai Giv Asn Gin Pro Aso Gin Glu Met 
180 isl ' 190 

Glu Ser Ala Ser Gin Trp Tyr Met Ara Ann Leu Gin Phe His Ar- Phe 
195 200 ' 205 

Trp Ser Val Asp Asp Thr Gin lie His Thr Glu Tyr Sei Ala Leu Ara 
210 Jib 220 

Ser Vai Val Met Ala A sr. Tyr Giu Giu Ser lie Lys Met Pro lie Asn 
225 230 " 235 240 

Glu Pro Aia Pro Civ Lys Lvs Lys Ser Gin lie Gin Giu Tyr Val Aso 
■2 4 5 2 50 2 55 

Tyr Asn Gly Gly Aia Gly Vai Gin His lie Ala Leu Lys Thr Giu Asd 
260 265 270 

lie Tie Thr Ala lie Ara Ser Leu Arq G!u Arq Glv Val Giu Phe Leu 
275 280 " " ' 235 

Ala Val Pro Phe Thr Tyr Tyr Lys Gin Leu Gin Giu Lys Leu Lvs Ser 
290 295 300 

Aia Lys lie Arg Val Lys Giu Ser lie Asp Val Leu Glu Glu Leu Lys 
3 0 5 3 10 '.: 1 5 3 2 Q 

lie Leu Val Asp Tyr Asp Giu Lys Gly Tyr Leu Leu Gin lie Phe Thr 
325 330 335 

Lys Pro Met Gin Asp Ara Pro Thr Vai Phe Leu Glu Vai lie Gin Ara 
340 345 350 

Asn Asn His Gin Gly Phe Gly Ala Giv Asn Phe Asn Ser Leu Phe Lys 
355 360 " 365 

Aia Phe Glu Glu Glu Gin Glu Leu Arg Giv Asn Leu Thr Aso Thr Asp 
370 375 * 380 

Pro Asn Gly Vai Pro Phe Arg Leu 
385 390 

(21 INFORMATION FOR SEQ ID NO : 8 : 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 392 amino acids 

(B) TYPE: amino acid 
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■:C) STRANDEDNESS : smqie 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : protein 

(:<i) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Thr Thr Tyr Asn Asn Lys Gly Pro Lys Pro Glu Arg Gly Arg Phe Leu 
15 10 15 

His Phe His Ser Val Thr Phe Tro Val Gly Asn Ala Lys Gin Ala Ala 
20 25 30 

Ser Phe Tyr Cys Asn Lys Met Gly Phe Glu Pro Leu Ala Tyr Ara Glv 
35 40 45 

Leu Glu Thr Gly Ser Arq Glu Val Vol Ser His Val lie Lys Ara Glv 
50 55 60 

Lys lie Val Phe Val Leu Cys Ser Ala Leu Asn Pro Trp Asn Lys Giu 
65 70 75 80 

Met Gly Asp His Leu Val Lys His Glv Asd Gly Val Lys Asp lie Ala 
8 5 90" 9 5 

Phe Glu Vai Glu Asp Cys Asp His lie Val Gin Lys Ala Arq Glu Arc 
100 105 110 

Gly Ala Lys lie Val Arq Glu Pro Trp Val Glu Gin Asp Lys Phe Glv 
115 120 115 

Lys Val Lvs Pne Air. Val Leu Gin Thr Tvr Gly Asd Thr Thr His Thr 
130 135 " 140 

Leu Val Glu Lys lie Asn Tyr Thr Gly Arg Phe Leu Pro Glv Phe Giu 
145 150 " 155 " 160 

Ala Pro Thr Tyr Lys Asp Thr Leu Leu Pro Lys Lou Pro Arg Cys Asn 
165 170 ' 175 

Leu Glu lie lie Asp His lie Val Glv Asn Gin Pro Asp Gin Glu Met 
130 190 

Gin Ser Ala Ser Glu Trc Tvr Leu Lvs Asn Leu Gin Phe ills Arq Phe 
195 200 " 205 

Trp Ser Val Asp Asp Thr C + n Vai His Thr Glu Tyr Ser Ser Leu Ara 
210 215 220 

Ser lie Val Val Thr Asn Tyr Glu Giu Ser lie Lvs Met Pro lie Asn 
225 230 235 ' 240 

Giu Pro Ala Pro Gly Arg Lys Lys Ser Gin lie Gin Giu Tyr Val Aso 
245 250 255 

Tyr Asn Gly Gly Ala Gly Val Gin His lie Ala Leu Lys Thr Glu Asd 
260 265 270 

lie lie Thr Ala lie Arc His Leu Arg Glu Arg Glv Thr Glu Phe Leu 
275 280 " ' 285 

Ala Ala Pro Ser Ser Tyr Tyr Lys Leu Leu Arg Giu Asn Leu Lys Ser 
290 295 ' 300 

Ala Lys lie Gin Vai Lys Glu Ser Met Asd Vai Leu Giu Giu Leu His 
305 310 " v- 
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lie Leu Val Asp Tyr Asp Glu Lys Gly Tvr Leu Leu Gin Tie ?he Thr 
325 330 335 

Lys Pro Met Gin Asp Arg Pro Thr Leu Phe Lou Glu Val lie Gin Arq 
340 345 350 

His Asn His Gin Gly Phe Gly Ala Glv Asn Phe Asn Ser Leu Phe Lvs 
355 360 * 365 

Ala Phe Glu Glu Glu Gin Ala Leu Arg Glv Asn Leu Thr Asp Leu Glu 
370 375 ' 380 

Pro Asn Gly Val Arg Ser Gly Met 
385 390 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 376 amino acids 
B ) TYPE: amino acid 
;C) STRANDEDNESS : smaio 
;Z) TOPOLOGY : linear 

(ii) ivOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Tyr Trp Aso Lys Gly Pro Lys Pro Glu Arg Gly Arg ?he Leu His Phe 
1 " 5 ' 10 " 15 

His Ser Val Thr Phe Trp Val Gly Asn Ala Lvs Gin Ala Ala Ser Phe 
20 25 30 

Tyr Cys Asn Lys Met Gly Phe Glu Pro Leu Ala Tyr Lys Glv Leu Glu 
3 5 AO 4 5 

Thr Gly Ser Arq Glu Val Val Ser His Val i io. Lys Gin Gly Lvs lie 
50 ' 55 b0 

Val Phe Val Leu Cys Ser Ala Leu Asn Pro Tr::- Asn Lvs Glu Met Glv 
65 ' 70 75' 30' 

Asp Mis Leu Val Lys His Gly Asp Gly Val Lys Asp lie Aid Phe Gl.: 
8 5 90 95 

Val Glu Asp Cys Glu His lie Val Gin Lvs Ala Arq Glu Arg GLv Ala 
100 105 110 

Lys lie Val Ara Glu Pro Trp Val Glu Glu Asp Lys Phe Gly Lys Val 
115 120 125 

Lys Phe Ala Val Leu Gin Thr Tyr Gly Asp Thr Thr His Thr Leu Val 
130 135 14 0 

Glu Lys lie Asn Tyr Thr Gly Arg Phe Leu Pro Gly Phe Glu Ala Pro 
145 150 155 160 

Thr Tyr Lys Aso Thr Leu Leu Pro Lys Leu Pro Ser Cys Asn Leu Glu 
165 170 175 

He He Asp Has He Val Gly Asn Gin Pro Aso Gin Glu Met Glu Ser 
180 185 190 

Ala Ser Glu Trc Tyr Leu Lys Asn Leu Gir. Phe His Ara Phe Trp Ser 
195 200 205 
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Val Asp Aso Thr Gin Val His Thr Glu Tyr Ser Ser Leu Arg Ser lie 
210 * 215 220 

Val Val Ala Asn Tyr Glu Glu Ser He Lys Met Pro He Asn Glu Pro 
225 230 235 240 

Ala Pro Gly Arg Lvs Lys Ser Gin He Gin Glu Tyr Val Asp Tyr Asn 
245 250 255 

Gly Gly Ala Gly Val Gin His lie Ala Leu Arq Thr Glu Asp lie He 
260 265 270 

Thr Thr lie Arg His Leu Arg Glu Arg Gly Met Glu Phe Leu Ala Val 
275 200 285 

Pro Ser Ser Tyr Tyr Arg Leu Leu Arg Glu Asn Leu Lys Thr Ser Lys 
290 ' 295 300 

lie Gin Val Lys Glu Asn Met Asp Val Leu Glu Giu Leu Lys He Leu 
3C5 310 315 320 

Val Aso Tvr Asd Glu Lvs Giv Tyr Leu Leu Gin Ho rr.e Thr Lys Pro 
325 " ' 330 335 

Xet Gin Aso Arg Pro Thr Leu Phe Leu Giu Val He Gl:: Arq His Asn 
340 345 350 

His Gin Glv Phe Gly Ala Gly Asn Phe Asn Ser Leu Phe Lys Ala Phe 
355 360 365 

Glu Giu Glu Gin Ala Leu Arq Gly 
370 375 

{2} INFORMATION FOR SEQ ID NO: 10: 

(i:- SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1766 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
•ID) TOPOLOGY: linear 

(ii; MOLECULE TYPE: cDNA tc mRNA 

(ill) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

{A) ORGANISM: Zea mays 

<ix) FEATURE: 

(A) NAME /KEY : CDS 

(3) LOCATION: 261.. 1595 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

ACTAGTTGTG AGAGCCTTCT GCGTTGGCAA TTGGCAGTAC AAGACAAATC ACATCCGCAA 60 

CCGCAACCAC AGAATCGTCC GTCCACGTGG CCCCCATCAC TTCCCTTTAT TTACCAGTCG 120 

TCCCCCATCC CCAGGGCChC CCACCAACAA GTGCAGTCAC CCGAGCCGCA AAC T G C AG C T 180 

CTGCAAGCTA CAGAGGCCAC CACGAGTCCA CGACGCCACG CCCTCCGAGA G AAA GAG AAA 240 
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GAGAAAACCA AAGCACGATA ATG CCC CCG ACC CCC AC A GCC GCC GCA GCC 2 90 

Met Pro Pro Thr Pro Thr Ala Ala Ala Ala 
1 5 10 

GGC GCC GCC GTG GCG GCG GCA TCA GCA GCG GAG CAA GCG GCG TTC CGC 338 
Gly Ala Ala Val Ala Ala Ala Ser Ala Ala Glu Gin Ala Ala Ph<=> ^q 
IS 20 25 



CTC GTG GGC CAC CGC AAC TTC GTC CGC TTC A AC CCG CGC TCC GAC CGC 
Leu Val Gly His Arg Asn Phe Val Arg Phe Asn Pro Ara Ser Asp Arg 
30 35 " 40 



TTC CAC ACG CTC GCG TTC CAC CAC GTG GAG CTC TGG TGC GCC GAC 

Phe His Thr Leu Ala Phe His His Val Glu Leu Tro Cvs Ala Asd Ala 

45 'SO 55 

GCC TCC GCC GCG GGC CGC TTC TCC TTC GGC CTG GGC GCG CCG CTC GC: 

Ala Ser Ala Ala Gly Arg Phe Ser Phe Glv Leu Glv Ala Pro 

60 65 70 



GGC GCC GAC GCT GCC ACC GCC GCG CTG CCC TCC TTC TCC GCC GCC 

Gly Ala Asp Ala Ala Thr Ala Ala Leu Pro Ser Phe Ser Ala Ala Al< 
HO H5 120 

GCG CGG CGC TTC GCA GCC GAC CAC GGC CTC GCG GTG CGC GCC GTC GCG 

Ala Arg Arg Phe Ala Ala Asp His Glv Lou Ala Val Arq Ala Va» Ala 

125 130 135 



CTC GCC GAG GTC GAG CTC TAC GGC GAC GTC GTG CTC CGG TAG GTG AGC 
Leu Ala Glu Val Glu Leu Tyr Glv Asp Val Val Lou Axq Tyr Val Se>- 
1*75 180 " 185 



186 



4 34 



48; 



GCA CGC TCC GAC CTC TCC ACG GGC AAC TCC GCG CAC GCG TCC CTG CTG 5 30 

Ala Arg Ser Asp Leu Ser Thr Gly Asn Ser Aid Wis Ala Ser Leu Leu 
11 8 0 R5 9Q 

CTC CGC TCC GGC TCC CTC TCC TTC CTC TTC ACG GCG CCC TAC GCG CAC 5 76 

Leu Arg Ser Gly Ser Leu Ser Phe Leu Phe Thr Ala Tro Tyr Ala Hi^ 
55 100 105 



674 



CTC CGC GTC GCC GAC GCC GAG GAC GCC TTC CGC GCC AGC GTC GCG GCC 722 

Leu Arg Val Ala Asp Ala Glu Asd Ala F'he Arq Ala Ser Val Ala Ala 
140 145 !5 0 

GGG GCG CGC CCG GCG TTC GGC CCC GTC GAC CTC GGC CGC GGC TTC CGC 7 70 

G_y Ala Arg Pro Ala Phe Glv Pro Vri 1 A.so Lou Glv Ara G-y Ph^ ^\>-q 

1— 160 165 ' " '^o 



8 18 



TAC CCG GAC GGC GCC GCG GGC GAG CCC TTC CTG CCG GGG TTC GAG GGC 8 66 

Tyr Pro Asp Gly Ala Ala Gly Glu Pro Phe Leu Pro Gly Phe Glu Gly 

190 195 200 

GTG GCC AGC CCC GGG GCG GCC GAC TAC GGG CTG AGC AGG TTC GAC CAC 914 

Val Ala Ser Pro Gly Ala Ala Asp Tyr Gly Leu Ser Arg Phe Asd His 
205 210 215 

ATC GTC GGC AAC GTG CCG GAG CTG GCG CCC GCC GCC GCC TAC TTC GCC 962 

He Val Gly Asn Val Pro Glu Leu Ala Pro Ala Ala Ala Tyr Ph^ Ala 
220 225 230 

GGC TTC ACG GGG TTC CAC GAG TTC GCC GAG TTC ACG ACG GAG GAC GTG 1010 

Gly Phe Thr Gly Phe His Glu Phe Ala Glu Phe Thr Thr Glu Asp Val 
235 240 245 ' 250 
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GGC ACC GCG GAG AGC GGC CTC AAC TCC ATG GTG CTC GCC AAC AAC TCG .105 8 
Gly Thr Ala Glu Ser Gly Leu Asn Ser Met Val Leu Ala Asn Asn Ser 
255 260 265 

GAG AAC GTG CTG CTC CCG CTC AAC GAG CCG GTG CAC GGC ACC AAG CGC 1106 
Glu Asn Val Leu Leu Pro Leu Asn Glu Pro Val His Gly Thr Lvs' Arq 
270 275 280 

CGC AGC CAG ATA CAA ACG TTC CTG GAC CAC CAC GGC GGC CCC GGC GTG 1154 
Arg Ser Gin lie Gin Thr Phe Leu Asp His His Gly Glv Pro Gly Val 
285 290 295 

CAG CAC ATG GCG CTG GCC AGC GAC GAC GTG CTC AGG ACG CTG AGG GAG 12 02 
Gin His Met Ala Leu Ala Ser Aso Asp Val Leu Arq Thr Leu Arg Glu 
300 305 * 310 

ATG CAG GCG CGC TCG GCC ATG GGC GGC TTC GAG TTC ATG GCG CCT CCC 125 0 
Met Gin Ala Arg Ser Ala Met Glv Glv Phe Glu Phe Met Ala Pro Pro 
315 320 " " 325 330 

ACA TCC GAC TAG TAT GAC GGC GTG AGG CGG CGC GCC GGG GAC GTG CTC 12 98 
Thr Ser Asp Tyr Tyr Aso Gly Val Arq Arg Arg Ala Gly Asp Val Leu 
335 34 0 * 34 5 

ACG GAA GCA CAG ATT AAG GAG TGC CAG GAG CTA GGG CTG CTG GTG GAC 134c 
Thr Glu Ala Gin lie Lys Glu Cvs Gin Glu Leu Giv Val Leu Val Aso 
350 * 355 ' 360 

AGG GAT GAC CAG GGC GTG CTG CTC CAA ATC TTC ACC AAG CCA GTG GGG 13 94 
Arg Asp Asp Gin Gly Val Leu Leu Gin He Phe Thr Lvs Pro Val Gly 
365 370 375 

GAC AGG CCA ACG CTG TTC TTC GAA ATC ATC CAA AGG ATC GGG TGC ATG 14 4 2 
Asp Arg Pro Thr Leu Phe Leu Glu He lie Gin Arg He Gly Cvs Met 
380 385 390 

GAG. AAG GAT GAG AAG GGG CAA GAA TAC CAA AAG GGT GGC TGC GGC GGG 14 90 
Giu Lys Asp Glu Lys Gly Gin Glu Tvr Gin Lys Glv Gly Cvs Gly Gly 
395 400 405 * * 410 

TTC GGC AAG GGA AAC TTC TCG CAG CTG TTC AAG TCC ATC GAG GAT TAT 15 38 
Phe Gly Lys Gly Asn Phe Ser Gin Leu Phe Lvs Ser lie Glu Asp Tvr 
4 15 4 20 425 

GAG AAG TCC CTT GAA GCC AAG CAA GCT GCT GCA GCA GCT GCA GCT CAG 15B6 
Glu Lys Ser Leu Glu Ala Lys Gin Ala Ala Ala Ala Ala Ala Ala Gin . 
430 435 440 

GGA TCC TAG GACAGTGCTT GGAGACGAGC AACTGCTGTG GCACTTTGTA 1635 
Gly Ser 

TCATGGAACA GAAATAATGA AGCGTCTTCT TTGTGACACT TGACATGCAA ATGTTTGTGT 1695 

TCTGTAACCG TTGAATATAT GGGACGATGC TATGATGGTG TAATAGATGG TAGAGAGGGT 17 55 

ACAACCCTGA T 1166 
(2) INFORMATION FOR SZQ ID NO : 1 1 : 

(i) SEQUENCE: CHARACTERISTICS: 

(A) LENGTH: 445 amino acids 

(B) TYPE: amino acid 
(0) TOPOLOGY: linear 

(iii MOLECULE TYPE: protein 
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;xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 1 : 

Met Pro Pro Thr Pro Thr Aia Ala Ala Ala Glv Ala Ala Val Ala Ala • 
15 10 15 

Ala Ser Ala Ala GIu Gin Ala Ala Phe Arg Leu Val Gly His Arg Asn 
20 25 30 

Phe Val Arq Phe Asn Pro Arg Ser Asp Arg Phe His Thr Leu Ala Phe 
35 40 45 

His His Val Glu Leu Trp Cys Ala Asp Ala Ala Ser Ala Ala Gly Arg 
50 55 60 

Phe Ser Phe Gly Leu Gly Ala Pro Leu Ala Ala Arq Ser Asp Leu Ser 
65 ** 70 75 80 

Thr Gly Asn Ser Ala His Ala Ser Leu Leu Leu Arq Ser Glv Ser Leu 
85 90 . 95 

Ser Phe Leu Phe Thr Ala Pro Tvr Aia His Glv Aia Asd Ala Ala Thr 
100 " 105 " 11C 

Ala Ala Leu Pro Ser Phe Ser Aia Ala Ala Aia Arn Arq Phc Al« Aia 
115 120 125 

Asp His Gly Leu Ala Val Arg Ala Val Ala Leu Arq Val Ala Asp Ala 
130 135 140 

Glu Asp Ala Phe Arq Ala Ser Vai Aia Aia Giy Aia Arg Pro Aia Phe 
145 150 155 160 

Gly Pro Vai Asp Leu Gly Arg Gly Phe Ara Leu Aia Glu Val Glu Leu 
165 170 175 

Tyr Gly Asd Val Val Leu Arg Tyr Val Ser Tyr Pro Asp Glv Ala Aia 
18 0 185 ' 190 

Glv Glu Pro Phe Leu Pro Gly Phe Glu Gly Val Ala Ser Pro Gly Ala 
195 200 205 

Aia Asd Tyr Glv Leu Ser Ara Phe Asd His lie Val G-y Asn Val Pre 
210 * 215 * 2 20 

Glu Leu Ala Pre Ala Ala Ala Tyr Phe Ala Gly Phe Thr Gly Phe His 
225 230 235 240 

Glu Phe Ala Glu Phe Thr Thr Glu Asp Val Gly Thr Ala Glu Ser Gly 
245 250 255 

Leu Asn Ser Met Val Leu Ala Asn Asn Ser Glu Asn Val Leu Leu Pro 
260 265 270 

Leu Asn Glu Pro Val His Gly Thr Lys Arq Ara Ser Gin lie Gin Thr 
275 280 235 

Phe Leu Asp His His Gly Gly Pro Gly Vai Gin His Met Ala Leu Ala 
290 295 300 

Ser Aso Asp Val Leu Arg Thr Leu Arg Glu Met Gin Ala Arq Ser Aia 
305 " 310 315 320 

Met Gly Gly Phe Glu Phe Met Aia Pro Pro Thr Ser Asp Tyr Tyr Asp 

325 330 335 
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Gly Val Arg Arg Arg Ala Gly Asp Val Leu Thr Glu Ala Gin lie Lys 
340 345 350 

Glu Cys Gin Glu Leu Gly Val Leu Val Asp Arg Asp Asp Gin Glv Val 
355 360 " 365 

Leu Leu Gin Tie Phe Thr Lys Pro Val Gly Asp Ara Pro Thr Leu Phe 
370 375 360 

Leu Glu lie He Gin Arg He Gly Cys Met Glu Lvs Asd Glu Lys Gly 
385 390 395 * " 400 

Gin Glu Tyr Gin Lys Gly Gly Cys Gly Gly Phe Gly Lvs Glv Asn Phe 
405 410 * - 415 

Ser Gin Leu Phe Lys Ser He Glu Asp Tyr Glu Lys Ser Leu Glu Ala 
420 425 430 

Lys Gin Ala Ala Ala Ala Ala Ala Ala Gin Gly Ser 
435 440 

[2) INFORMATION FOR SEQ ID NO: 12: 

;:; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1356 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE3S : double 

(D) TOPOLOGY: linear 

{xi) MOLECULE TYPE: cDNA to mRNA 

{iii? HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

<A1 ORGANISM: Arabidcpsis r.haliana 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1 . . 1254 

lix] FEATURE: 

(A) NAME/ KEY : misc_feature 

(B) LOCATION: 1 . . 2 

(D) OTHER INFORMATION: / s tanca rd_name= 

"translation initiation 
codon " 

(ix) FEATURE: 

(A) NAME /KEY : misc_feature 

(B) LOCATION: 1252.. 1254 

(D) OTHER INFORMATION: / s t anda rd_name = 

"translation termination 
codon " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

ATG TCC AAG TTC GTA AGA AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT 4 8 

Met Ser Lys Phe Val Arg Lys Asn Pro Lvs Ser Asp Lvs Phe Lys Val 
1 5 10 15 

AAG CGC TTC CAT CAC ATC GAG TTC TGG TGC GGC GAC GCA ACC AAC GTC 96 
Lys Arg Phe His His lie Glu Phe Trp Cvs Glv Asp Ala Thr Asn Val 
20 25 " * 30 
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GCT CGT CGC TTC TCC TGG GGT CTG GGG ATG AG A TTC TCC GCC AAA TCC 14 4 

Aia Arg Arg Phe Ser Trp Gly Leu GLv Met Arq Phe Ser Ala Lys Ser 

35 40 45 

GAT CTT TCC ACC GGA AAC ATG GTT CAC GCC TCT TAG CTA CTC ACC TCC 192 

Asp Leu Ser Thr Gly Asn Met Vai His Aia Ser Tyr Leu Leu Thr Ser 

5 0 5 5 60 

GGT GAC CTC CGA TTC CTT TTC ACT GCT CCT TAG TCT CCG TCT CTC TCC 24 0 

Gly Asp Leu Arg Phe Leu Phe Thr Ala Pro Tvr Ser Pro Ser Leu Ser 

65 70 ' 7 5 80 

GCC GGA GAG ATT AAA CCG ACA ACC ACA GCT TCT ATC CCA AGT TTC GAT 288 

Ala Gly Glu lie Lys Pro Thr Thr Thr Ala Ser lie Pro Ser Phe Asp 

85 90 95 

CAC GGC TCT TGT CGT TCC TTC TTC TCT TCA CAT GGT CTC GGT GTT AG A 336 

His Gly Ser Cys Arg Scr Phe Phe Ser Ser His Gly Leu Giv Val Arg 

100 105 - 110 

GCC GTT GCG ATT GAA GTA GAA GAC GCA GAG TCA GCT TTC TCC ATC AGT 38 4 

Ala Val Aia lie Glu Val Glu Asp Ala Glu Ser Ala Phe Ser lie Ser 

115 120 125 

GTA GCT AAT GCC GCT ATT CCT TCC TCC CCT CCT ATC CTC CTC AAT GAA ^32 

Val Ala Asn Gly Ala lie Pro Ser Ser Pro Pro i le Val Leu Asn Glu 

130 135 M0 

GCA GTT ACG ATC GCT GAG GTT AAA CTA TAG GGC CAT GTT GTT CTC CGA 4 80 

Ala Val Thr lie Ala Giu Vul Lys Leu Tyr Gly Asp Vai Val Leu Ara 

145 150 155 160 

TAT GTT AGT TAC AAA GCA GAA GAT ACC GAA AAA TCC GAA TTC TTG CCA 528 

Tyr Val Ser Tyr Lys Ala Glu Aso Thr Giu Lys Ser Glu Phe Leu Pro 

165 170 175 

GGG TTC GAG CGT GTA GAG GAT GCG TCC TCG TTC CCA TTG GAT TAT GGT 57 6 

Gly Phe Glu Arg Val Giu Asp Aia Ser Ser Phe Pro Leu Asp Tyr Gly 

180 18 5 190 

ATC CGG CGG CTT GAC CAC GCC GTG GGA AAC GTT CCT GAG CTT GGT CCG n>2 4 

lie Arq Arq Leu Asp His Aia Val Gly Asn Val Pro GLu Leu Glv Pro 

195 200 205 

GCT TTA ACT TAT GTA GCG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG 67 2 

Ala Leu Thr Tyr Val Ala Gly Phe Thr Gly Phe His Gin Phe Ala Glu 

210 215 220 

TTC ACA GCA GAC GAC GTT GGA ACC GCC GAG AGC GGT TTA AAT TCA GCG 7 20 

Phe Thr Aia Asp Asp Val Gly Thr Aia Glu Ser Gly Leu Asn Ser Ala 

225 230 235 240 

GTC CTG GCT AGC AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CCA 7 68 

Val Leu Ala Ser Asn Asp Glu Met Val Leu Leu Pro lie Asn Giu Pro 

245 250 255 

GTG CAC GGA ACA AAG AGG AAG AGT CAG ATT CAG ACG TAT TTG GAA CAT 816 

Vai His Gly Thr Lys Arg Lys Ser Gin lie Gin Thr Tyr Leu Glu His 

260 * 265 270 

AAC GAA GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA GAC ATA S64 

Asn Glu Gly Ala Gly Leu Gin His Leu Ala Leu Met Ser Giu Asp lie 

275 280 285 
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TTC AGG ACC CTG AGA GAG ATG AGG AAG AGG AGC AGT ATT GGA GGA TTC 9 Li 

Phe Arq Thr Leu Arg Glu Met Arg Lys Arq Ser Ser lie Glv Gly Phe 
290 295 * 300 

GAC TTC ATG CCT TCT CCT CCG CCT ACT TAC TAC CAG AAT CTC AAG AAA 9 60 

Asp Phe Met Pro Ser Pro Pre Pro Thr Tvr Tyr Gin Asn L-su Lvs Lys 
305 310 " 315 * 320 

CGG GTC GGC GAC GTG CTC AGC GAT GAT CAG ATC AAG GAG TGT GAG GAA 1006 
Arg Val Gly Asp Val Leu Ser Asp Asp Gin lie Lys Glu Cys Glu Glu 
325 330 335 

TTA GGG ATT CTT GTA GAC AGA GAT GAT CAA GGG ACG TTG CTT CAA ATC 105 6 
Leu Gly lie Leu Val Asd Arg Aso Asp Gin Gly Thr Leu Leu Gin lie 
340 * * 345 350 

TTC AC A AAA CCA CTA GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC 1104 
Phe Thr Lys Pro Leu Gly Asp Arq Pro Thr lie Phe lie Giu lie lie 
355 360 365 

CAG AGA GTA GGA TGC ATG ATG AAA GAT GAG GAA GG G AAG GGT TAC CAG 11 52 
Gin Arg Val Gly Cys Met Met Lys Aso Glu Glu Gly Lys Ala Tyr Gin 
370 ' 375 * 330 

AGT GGA GGA TCT GGT GGT TTT GGC AAA GGC AAT TTC TCT GAG CTC TTC 12 00 
Ser Glv Gly Cvs Glv Gly Phe Giy Lvs Glv Asn Phe Ser Giu Leu Phe 
385 ' " 390 ' " 395 -100 

AAG TCC ATT GAA GAA TAC GAA AAG ACT CTT GAA GCC AAA CAG TTA GTG 12-18 
Lys Ser lie Glu Giu Tyr Glu Lys Thr Leu Giu Aia Lys Gin Leu Val 
4 05 4 10 * 4 15 

GGA TGA AC AAG AAG AA GAACCAACTA AAGGATTGTG T AATTAATGT AAAACTGTTT 1304 
Gly 

TATCTTATCA AAACAATGTA TACAACATCT C A T T T AAAA A CGAGATCAAT CC 13 56 

(2) INFORMATION FOR SEQ ID NO : 1 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 18 ammo acids 

(B) TYPE: amino acid 
{ 0 ) TOPOLOGY: linear 

{ i i } MOLECULE TYPE: protein 

(>:i) SEQUENCE DESCRI ?T XOt-i : SEQ ID NO: 13: 

Met Ser Lys Phe Val Arg Lys Asn Pro Lys Ser Asp Lys Phe Lys Val 
1 5 10 15 

Lvs Arg Phe His His lie Giu Phe Trp Cys Giy Asp Ala Thr Asn Val 
20 25 30 

Ala Arg Arg Phe Ser Trp Giv Leu Giy Met Arg Phe Ser Ala Lys Ser 
35 40 45 

Asp Leu Ser Thr Gly Asn Met Val His Ala Ser Tvr Leu Leu Thr Ser 
50 55 60 

Giy Asp Leu Arq Phe Leu Phe Thr Ala Pro Tyr So r Pro Ser Leu Ser 
65 70 75 80 

Ala Glv Giu lie Lys Pre Thr Thr Thr Ala Ser lie Pro Ser Phe Asp 
85 90 95 
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His Gly Ser Cys Ara Ser Phe Phe Ser Ser His Gly Leu Gly Val Arc 
100 105 110 

Aia Val Ala lie Giu Val Glu Asp Ala Gin Ser Ala Phe Ser lie Ser 
115 120 125 

Val Aia Asn Glv Ala lie Pro Ser Ser Pro Pro lie Val Leu Asn Glu 
130 " 135 140 

Aia Vai Thr He Ala Glu Val Lys Leu Tyr Gly Asp Val Val Leu Arg 

145 150 155 160 

Tyr Val Ser Tyr Lys Ala Giu Asp Thr Glu Lys Ser Giu Phe Leu Pro 
165 i*70 175 

Gly Phe Glu Arg Val Glu Asp Aia Ser Ser Phe Pro Leu Asp Tyr Gly 
180 135 190 

lie Arg Arg Leu Asp His Ala Val Gly Asn Vai Pro Glu Leu Gly Pro 
195 200 205 



Ala Leu Thr Tvr Val Ala Gly Phe Thr Gly Phe His Gin Phe Aia Glu 
210 



21?, 220 



Phe Thr Ala As d Asp Vai G 1 y Th r A 1 a G 1 u S e r G i. y 1 . a \ : As n S e r Ala 
225 230 ' 235 240 

Vai Leu Ala Ser Asn Asp Giu Met Vai Leu Leu Pro lie Asn Glu Pro 
245 250 255 

Val His Gly Thr Lvs Arg Lvs Ser Gin li-? Gin Thr Tyr Leu Glu His 
260 265 270 

Asn Glu Gly Aia Gly Leu Gin His Leu Ala Leu Met Ser Giu Asp He 
275 280 285 

Phe Arg Thr Leu Arg Giu Met Arg Lys Arq Ser Ser lie Gly Giy Phe 
29C 295 300 

Asp Phe Met Pro Ser Pre Pro Pro Thr Tyr Tyr Gin Asr. Leu Lys Lys 
305 310 315 320 

Arg Val Gly Aso Val Leu Ser Asp Asp Gin lie Lys Giu Cys Glu Glu 
'32 5 * 3 30 3 35 

Leu Gly lie Leu Val A.sp Arq Asp Aso 01 n Giy Thr Leu Leu Gin lie 
340 345 350 

Phe Thr Lys Pro Leu Giy Asp Arg Pro Thr He Fhe He Giu lie lie 
355 360 365 

Gin Arg Val Gly Cys Met Met Lys Asp Giu Glu Giy Lys Ala Tyr Gin 
370 375 380 

Ser Gly Giy Cvs Gly Giy Phe Giy Lys Gly Asn Phe Ser Glu Leu Phe 
385 ' 390 395 400 

Lys Ser He Glu Glu Tvr Glu Lys Thr Leu Glu Ala Lys Gin Leu Val 
405 ' 41C 415 

Gly * 
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(2) INFORMATION FOR SEQ ID NO : 1 4 : 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1448 base pairs 
(B> TYPE: nucleic acid 
(Ci STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(iii MOLECULE TYPE: cDNA to mRNA 

(iiij HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis chaliana 

fix) FEATURE: 

(A) NAME/KEY: CDS 

(3) LOCATION: 9. . 134 6 

(i>:) FEATURE: 

{ A) NAME/ KEY : mi sc_f eature 

(B) LOCATION: 9. . II 

(D) OTHER INFORMATION: / s t anca rd_r.ame = 

"translation initiation 
.:odon" 

(ixi FEATURE: 

(A) NAME/ KEY : misc_rea cure 

(B) LOCATION: 1344. .1346 

(D) OTHER INFORMATION: / s z anda rd_name= 

"translation termination 
codon " 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 4 : 

TGAAATCA ATG GGC CAC CAA AAC GCC GCC GTT TCA GAG AAT CAA AAC CAT 50 
Met Gly Has Gin Asn Ala Ala Val Ser Glu Asn Gin Asn His 
i 5 10 

GAT GAG GGC OCT GCG TCG TCG CCG GGA TTC AAG CTC GTC GGA TTT TCC 98 
Asp Asp Gly Ala Ala Ser Ser Pro Giv Phe Lvs Leu Val Gly Phe Ser 

20 " : c : 30 

AAG TTC GTA AG A .AAG AAT CCA AAG TCT GAT AAA TTC AAG GTT AAG CGC 14 6 

Lys Phe Val Arc Lys Asn Pro Lys Ser Asp Lys Phe Lys Val Lys Arg 
3 5 4 0 ' 4 5 

TTC CAT CAC ATC GAG TTC TGG TGC GGC GAC GCA ACC AAC GTC GCT CGT 194 
Phe His His lie Glu Phe Trp Cys Gly Aso Ala Thr Asn Val Ala Arg 
50 55 60 

CGC TTC TCC TGG GGT CTG GGG ATG AGA TTC TCC GCC AAA TCC GAT CTT 24 2 

Arg Phe Ser Trp Gly Leu Gly Met Arg Phe Ser Ala Lys Ser Asp Leu 
65 70 75 

TCC ACC GGA AAC ATG GTT CAC GCC TCT TAG CTA CTC ACC TCC GGT GAC 2 90 

Ser Thr Gly Asn Met Val His Ala Ser Tyr Leu Leu Thr Ser Glv Asp 
80 85 90 

CTC CGA TTC CTT TTC ACT GCT CCT TAC TCT CCG TCT CTC TCC GCC GGA 338. 
Leu Arg Phe Leu Phe Thr Ala Pro Tyr Ser Pro Ser Leu Ser Ala Gly 
95 100 105 110 

GAG ATT AAA CCG ACA ACC ACA GCT TCT ATC CCA AGT TTC GAT CAC GGC 38 6 

Glu lie Lys Pro Thr Thr Thr Ala Ser lie Pro Ser Phe Asp His Giv 
115 120 * 125 
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TCT TGT CGT TCC TTC TTC TCT TCA CAT GGT CTC GGT GTT AGA GCC GTT 4 34 

Ser Cvs Ara Ser Phe Phe Ser Ser His Gly Leu Gly Vai Arg Ala Vai 
130 135 14C 

GCG ATT GAA GTA GAA GAC GCA GAG TCA GCT TTC TCC ATC ACT GTA GCT 4 62 

Ala lie Glu Val Glu Asp Ala Glu Ser Ala Phe Ser lie Ser Vai Ala 
145 150 155 

AAT GGC GCT ATT CCT TCG TCG CCT CCT ATC GTC CTC AAT GAA GCA GTT 530 
Asn Gly Ala lie Pro Ser Ser Pro Pro lie Val Leu Asn Glu Ala Vai 
160 165 WO 

ACG ATC CCT GAG GTT AAA CTA TAC GGC GAT GTT GTT CTC CGA TAT GTT 57 3 

Thr He Ala Glu Val Lvs Leu Tyr Gly Asp Vai Val Leu Arq Tyr Val 
175 180 IBS 190 

AGT TAC AAA GCA GAA GAT ACC GAA AAA TCC GAA TTC TTG CCA GGG TTC 62 6 

Ser Tvr Lys Ala Glu Aso Thr Glu Lys Ser Glu Phe Leu Fro Gly Phe 
195 * 200 205 

GAG CCT GTA GAG GAT GCG TCG TCG TTC CCA TTC GAT TAT CGT ATC CGG 67 4 

Glu Arq Val Glu Asp Ala Ser Ser Phe Pro Leu Asp Tyr Gly lie Arq 
210 215 220 

CGG CTT GAC CAC GCC GTG GGA AAC GTT CCT GAG CTT GGT CCG GCT TTA "^22 
Arg Leu Asp ills Ala Val Gly Asn Val Pro Glu Leu Gly Pro Ala Leu 
225 2 30 235 

ACT TAT GTA GCG GGG TTC ACT GGT TTT CAC CAA TTC GCA GAG TTC ACA 7 70 

Thr Tvr Val Ala Gly Phe Thr Gly Phe His Gin Phe Ala Glu Phe Thr 
240 245 250 

GCA GAC GAC GTT GGA ACC GCC GAG AGC GGT TTA AAT TCA GCG GTC CTG 8 18 

Ala Aso Asp Val Gly Thr Ala Glu Ser Gly Leu Asn Ser Ala Vai Leu 
255 ~ 260 265 270 

GCT AGC AAT GAT GAA ATG GTT CTT CTA CCG ATT AAC GAG CCA GTG CAC 8 66 

Ala Ser Asn Asp Glu Met Vai Leu Leu Pro lit Asn Glu Pro Val His 
275 280 285 

GGA ACA AAG AGG AAG AGT CAG ATT GAG ACG TAT TTG GAA CAT AAC GAA r * \ 4 

Gly Thr Lys Arg Lys Ser Gin lie Gin Thr Tyi Lou Giu His A sr. Giu 
290 295 300 

GGC GCA GGG CTA CAA CAT CTG GCT CTG ATG AGT GAA GAC ATA TTC AGG 9 62 

Gly Ala Gly Leu Gin His Leu Ala Leu Met Ser Giu Asp lie Phe Arg 
305 310 315 

ACC CTG AGA GAG ATG AGG AAG AGG AGC AGT ATT CGA GGA TTC GAC TTC 1010 
Thr Leu Arg Glu Met Arg Lys Arg Scr Ser lie Gly GJy Phe Asp Phe 
320 325 330 

ATG CCT TCT CCT CCG CCT ACT TAC TAC CAG AAT CTC AAG AAA CGG GTC 10 53 
Met ?-o Ser Pro Pro Pro Thr Tyr Tyr Gin Asn Leu Lys Lys Arg Val 
335 340 345 350 

GGC GAC GTG CTC AGC GAT GAT CAG ATC .AAG GAG TGT GAG GAA TTA GGG 1106 
Gly Aso Val Leu Ser Asp Asp Gin lie Lys Giu Cys Giu Giu Leu Gly 
355 360 365 

ATT CTT GTA GAC AGA GAT GAT CAA GGG ACG TTG CTT CPiA ATC TTC ACA 1154 
lie Leu Val Asp Ara Aso Asp Gin Glv Thr Leu Let: Gin lie Phe Thr 
370 * 375 380 
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AAA CCA CTA GGT GAC AGG CCG ACG ATA TTT ATA GAG ATA ATC CAG AGA 1202 
Lys Pro Leu Glv Asp Arq Pro Thr lie Phe lie Glu lie lie Gin Ara 
335 390 395 

GTA GGA TGC ATG ATG AAA GAT GAG GAA GGG AAG GCT TAC CAG AGT GGA 12 50 
Val Gly Cys Met Met Lys Asp Glu Glu Gly Lys Ala Tyr Gin Ser Gly 
400 405 ' 410 

GGA TGT GGT GGT TTT GGC AAA GGC AAT TTC TC7 GAG CTC TTC AAG TCC 129$ 
Gly Cys Gly Gly Phe Gly Lys Gly Asn Phe Ser. Glu Leu Phe Lys Ser 
415 420 425 430 

ATT GAA GAA TAC GAA AAG ACT CTT GAA GCC AAA CAG TTA GTG GGA TGA 134 6 
He Glu Giu Tyr Glu Lys Thr Leu Giu Ala Lys Gin Leu Val Gly 
4 35 4 40 445 

AC AAG AAG AA G.AACCAACTA AAGGATTGTG TAATTAATGT AAAACTGTTT TAT CTT ATC A 1400 

AAACAATGTA TACAACATCT CATTTAAAAA CGAGATCAAT CC 14 48 

(2) INFORMATION FOR SEQ ID NO : 1 5 : 

(i) SEQUENCE CHARACTERISTICS : 

fA; LENGTH: 4 4b ammo acids 
(B) TYPE: amino f vcid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Gly His Gin Asn Ala Ala Val Ser Giu Asn Gir. Asn His Asp Asp 
15 10 15 

Glv Aia Ala Ser Ser Pro Glv Phe Lvs Leu Val Glv Phe Ser Ly^ Ph^ 
20 * 25 ' 30 

Val Arg Lys Asn Pro Lys Ser Asc Lvs Pho Lys Val Lys Ara Phe His 
3 5 4 0 ' "15* 

His lie Glu Phe Tro Cvs Glv Asa Ala Thr Asn Val Ala Arg Arn Phe 
50 55 60 

Ser ':'rp Gly Leu Gly Met Arq Phe Ser Aia Lys Ser Asp Leu Ser Thr 
6 5 7 0 7 5 8 0 

Gly Asn Met Vai His Ala Ser Tyr Leu Leu Thr Ser Gly Asp Leu Arq 
8 5 90 9 5 

Phe Leu Phe Thr Aia Pro Tyr Ser Pro Ser Leu Ser Aia Gly Glu lie 
100 105 110 

Lys Pro Thr Thr Thr Ala Ser lie Pro Ser Phe Asp His Glv Ser Cys 
115 120 125 

Arg Ser Phe Phe Ser Ser His Gly Leu Glv Val Arg Ala Val Aia lie 
130 135 " 140 

Giu Val Glu Asp Ala Glu Ser Ala Phe Ser lie Ser Val Ala Asn Glv 
145 150 155 160 

Ala He Pro Ser Ser Pre Pro lie Val Leu Asn Glu Ala Val Thr lie 
165 170 175 

Ala Glu Vai Lys Leu Tyr Gly Asp Vai Val Leu Arg Tyr Vai Ser Tyr 
1 80 18 5 " 190 
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Lys Ala Glu Asp Thr Glu Lys Ser GIu Phe Leu Pro Gly Phe GIu Arg 
195 200 205 

Val Glu Asp Ala Ser Ser Phe Pro Leu Asp Tyr Gly He Arg Arg Leu 
210 215 220 

Asp His Ala Val Gly Asn Val Pro Glu Leu Gly Pre Ala Leu Thr Tyr 
225 230 235 240 

Val Ala Gly Phe Thr Gly Phe His Gin Phe Ala Glu Phe Thr Ala Asp 
245 250 255 

Asp Val Gly Thr Ala Glu Ser Gly Leu Asn Ser Ala Val Leu Ala Ser 
260 265 270 

Asn Aso Glu Met Val Leu Leu Pro He Asn Glu Pro Val His Gly Thr 
275 230 295 



Lys Arg Lys Ser Gin He Gin Thr Tyr Leu Glu i\is Asn Glu Gly Ala 
290 295 300 

Gly Leu Gin His Leu Ala Leu Met Ser Glu Aso II- Phe Arg Thr Leu 
305 310 315 320 

Arg Glu Met Ara Lvs Arq Ser Ser lie Gly Gly Phe Asp Pho Met Pro 
325 330 335 

Ser Pro Pro Pro Thr Tyr Tvr Gin Asn Leu Lvs Lvs Arq Val Giy Asp 
340 ' 345 350 

Val Leu Ser Asp Asp Gin He Lys Glu Cys Glu Glu Leu Gly He Leu 
355 360 36 5 

Val Asp Arg Asp Aso Gin Glv Thr Leu Leu Gin He Phe Thr Lys Pro 
370 * 375 380 

Leu Gly Asp Arq Pro Thr He Phe He Giu He He Gin Arq Val Gly 
385 * " 390 395 400 

Cys Met Met Lvs Aso Glu Glu Gly Lys Ala Tyr Gin Ser Gly Gly Cys 
405 410 415 

Gly Gly Phe Glv Lvs Giy Asn Phe Ser Giu Leu Phe Lys Ser He Glu 
420 * 425 430 

Glu Tyr Glu Lys Thr Leu Glu Ala Lys Gin Leu Val Gly 
435 440 445 

(2) INFORMATION FOR SEQ ID NO : 1 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 513 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to raRNA 
(iii) HYPOTHETICAL : NO 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: Vernonia qaJatnenensis 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: vsi.pic0015.b2 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 6 : 



CCACACCGAT 


TGCCGGAACT 


TCACCGCCTC 


TCACGGCCT7 


GCAGTCCGAG 


CAATCGCCAT 


60 


TGAAGTCGAT 


GACGCCGAAT 


TAGCTTTCTC 


CGTCAGCGTC 


TCTCACGGCG 


CTAAACCCTC 


120 


CGCTGCTCCT 


GTAACCCTTG 


GAAACAACGA 


CGTCGTATTG 


TCTGAAGTTA 


AGCTTTACGG 


180 


CGATGTCGCT 


TTCCGGTACA 


TAAGTTACAA 


AAATCCGAAC 


TATACATCTT 


CCTTTTTGCC 


240 


CGGGTTCGAG 


CCCGTTGAAA 


AGACGTCGTC 


GTTTTATGAC 


CTTGACTACG 


GTATCCGCCG 


300 


TTTGGACCAC 


GCCGTAGGNA 


ACGTCCCTGA 


GCTTGCTTCG 


GCAGTGGACT 


ACGTGAAATC 


360 


ATTCACCGGA 


TTCCATGAGT 


TCCCCGAATT 


CACCGCGGAG 


GACGTCGGGA 


CGAGCGAGAG 


420 


GGAACTGAAT 


TCGGTCGTTT 


TAGCTTGCAA 


CAGTGAGATG 


GTCTTGATTC 


CGATGAACGA 


480 


GCCGGTGTAC 


GGAANAAAAG 


GAAGNAGCCA 


GAT 






513 
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INDICATIONS RELATING TO A DEPOSITED MICROOA&L&UQl/ 

(PCTRule \3bis) 

A. The indications made below relate to the microorganism referred to in the description 

on page & . line 1 

B. IDENTIFICATION OF DEPOSIT Further deposits arc identified on an additional sheet Q 

Name of depositary institution 
AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 
12301 Parklawn Drive 
Rockville, Maryland 20852 
US 



Date of deposit 

25 June 1996 (25.06.96) 



Accession Number 
98083 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet | 

In respect of those designations iii which a European patent is sought, 
a sample of the deposited microorganism will be made available until 
the publication of the mention of the grant of the European patent or 
until the date on which the application has been refused or withdrawn 
or is deemed to be withdrawn, only by the issue of such a sample to an 
expert nominated by the person requesting the sample. (Rule 28(4) EPC) 

D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (sjyecify the general nature of the indications e.g.. "Accession 
Number of Deposit") 



For receiving Office use only 



sheet was received with the international application 




For International Bureau use only 



| | This sheet was received by the International Bureau on: 



Authorized officer 



Fonitt s CT/RO/134 (July 1992) 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRule \3bis) 



A. The indications made below relate to the microorganism referred to in the description 

on page 6 , line 1 

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet [ | 

Name of depositary institution 
AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 
12301 Parklawn Drive 
Rockville, Maryland 20852 
US 



Dnic of deposit 

25 June 1996 (25.06.96) 


Accession Number 
97622 


C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet j^j 



In respect of those designations in which a European patent is sought, 
a sample of the deposited microorganism will be made available until 
the publication of the mention of the grant of the European patent or 
until the* date on which the application has been refused or withdrawn 
or is deemed to be withdrawn, only by the issue of such a sample to an 
expert nominated by the person requesting the sample. (Rule 28(A) EPC) 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are nor for ail designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specify the general nature of the indications e g . "Accession 
Number of Deposit") 




For receiving Office use only 



This sheet was received with the international application 



Auti 




For International Bureau use onlv 



| | This sheet was received by the International Bureau on: 



Authorized officer 



Korm PCT/RO/134 (July 1992) 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRulc 136/j) 



A. The indications made below relate to the microorganism referred lo in the description 

on page ^ * i* nc 

B. IDENTIFICATION OF DEPOSIT Further deposits arc identified on an additional sheet 

Name of depositary institution 
AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 
12301 Parklawn Drive 
Rockville, Maryland 20852 
US 



Dale of deposit 

12 June 1997 


Accession Number 
209120 


C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet [ | 



In respect of those designations in which a European patent is sought, 
a sample of the deposited microorganism will be made available until 
the publication of the mention of the grant of the European patent or 
until the* date on which the application has been refused or withdrawn 
or is deemed to be withdrawn, only by the issue of such a sample to an 
expert nominated by the person requesting the sample. (Rule 28(4) EPC) 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the International Bureau later (specifydie general nature oj the indications e.g. "Accession 
Number ofDeposil") 



f fp For receiving Office use only 
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CLAIMS 

1 . An isolated nucleic acid fragment encoding a plant p-hydroxy- 
phenyipyruvate dioxygenase enzyme, the fragment comprising a nucleotide 
sequence selected from the group consisting of 
5 nucleotide sequences encoding a polypeptide comprising the amino 

acid sequences set forth in SEQ ID NO:3, SEQ ID NO: 1 1 . SEQ ID 
NO: 1 3, and SEQ ID NO: 1 5 and 

modified nucleotide sequences essentially similar to the nucleotide 
sequences of SEQ ID NO:2, SEQ ID NO 10, SEQ ID NO: 12 and 
10 SEQ ID NO: 14 containing deletions, insertions, or substitutions in 

the sequence that do not affect the functional properties of the 
encoded protein. 

2. An isolated nucleic acid fragment encoding a plant />hydroxyphenyl- 
pyruvate dioxygenase enzyme, the fragment comprising a nucleotide sequence as 

1 5 set forth in SEQ ID NO: 14. 

3. A chimeric gene comprising the nucleic acid fragment of Claims 1 or 
2 opcrably linked to at least one suitable regulatory sequence. 

4. The chimeric gene of Claim 3 wherein at least one suitable regulatory 
sequence directs gene expression in a microorganism. 

20 5. The chimeric gene of Claim 3 wherein the at least one suitable 

regulatory sequence directs gene expression in a plant. 

6. A plasmid vector comprising the nucleic acid fragment of Claims 1 or 
2 operablyiinked to at least one suitable regulatory sequence. 

7. A transformed host cell comprising a host cell and the plasmid vector 
25 of Claim 6. 

8. The transformed host cell of Claim 7 wherein the host cell is derived 
from a plant or is a microorganism. 

9. The transformed host cell of Claim 8 wherein the microorganism is 
E. coli. 

30 10. A transformed plant tolerant to contact with at least one compound 

that inhibits the rate of the reaction of />hydroxyphenyi pyruvate dioxygenase 
enzyme in a non-transformed plant, the transformed plant comprising the chimeric 
gene of Claim 3 and a host plant. 

1 1 . The transformed plant of Claim 1 0 wherein the host plant is a cereal 
35 crop plant. 

12. A method to identify a compound useful for its ability to inhibit the 
rate of the reaction of /?-hydroxyphenylpyruvate dioxygenase enzyme comprising: 

(a) transforming a host cell with the plasmid vector of Claim 6; 
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(b) facilitating expression of the nucleic acid fragment encoding the 
plant />hydroxyphenylpyruvate dioxygenase enzyme: 

(c) contacting the expressed enzyme from step (b) with a test 
compound; and 

5 (d) evaluating the capacity of the test compound to inhibit the rate of 

the reaction of /?-hydroxyphenylpyruvate dioxygenase enzyme. 

13. The method of Claim 12 wherein evaluating the capacity of the test 
compound to inhibit the rate of the reaction of p-hydroxyphenylpyruvate 
dioxygenase enzyme is accomplished by measuring oxygen utilization, carbon 

10 dioxide release, homogentisate production, loss of /?-hydroxyphenylpyruvate or 
maleylacetoacetate production. 

14. The method of Claim 12 wherein the transformed host cell is an 

E. coll that comprises a chimeric gene encoding a plant /?-hydroxyphenylpyruvate 
dioxygenase enzyme. 
15 15. A compound that inhibits the activity of a plant /7-hydroxyphenyl- 

pyruvate dioxygenase enzyme, the compound identified by the method of 
Claim 14. 

16. A method for imparting tolerance to a plant to at least one compound 
that inhibits the rate of reaction of /7-hydroxypheny [pyruvate dioxygenase enzvmc 

20 comprising: 

(a) transforming a host plant cell with a chimeric gene comprising a 
nucleic acid fragment encoding plant /?-hydroxyphenylpyruvatc 
dioxygenase. and 

(b) expressing the chimeric gene in an amount effective to render 
25 the transformed plant substantially tolerant to the at least one 

compound that inhibits the rate of reaction of />hydroxyphenyl- 
pyruvate dioxygenase. 

17. A method for the microbial production of active plant /?-hydroxy- 
phenylpyruvate dioxygenase enzyme comprising: 

30 (a) stably transforming a microorganism with the chimeric gene of 

Claim 4 encoding the plant p-hydroxyphenylpyruvate 
dioxygenase; 

(b) facilitating expression by the chimeric gene for a suitable period; 
and 

35 (c) recovering active plant /7-hydroxyphenylpyruvate dioxygenase 

enzyme. 

1 8. A method to overexpress /?-hydroxyphcnylpyruvate dioxygenase 
enzyme in a plant comprising: 
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(a) stably transforming a host plant cell with a chimeric DNA 
molecule comprising at least one copy of a suitable promoter to 
drive expression of an associated coding sequence in a plant cell 
operably linked to at least one copy of a homologous or 

5 heterologous coding sequence encoding /7-hydroxyphenyi- 

pyruvate dioxygenase: and 

(b) growing the transformed host plant cell of step (a). 

19. The method of Claim 18 wherein the chimeric DNA molecule is the 
chimeric gene of Claim 5. 
10 20. An isolated nucleic acid fragment comprising a member selected from 

the group consisting of: 

(a) an isolated nucleic acid fragment as set forth in SEQ ID NO: 1 6; 

(b) an isolated nucleic acid fragment that is essentially similar to an 
isolated nucleic acid fragment as set forth in SEQ ID NO: 16; 

15 and 

(c) an isolated nucleic acid fragment that is complementary to (a) or 
(b). 
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FIG.1 



i CAAGAAACGNGTCGNCGACGTGCTCAGCGATGATCAGATCAAGGAGTGTGAGGAATTAGG 

61 GATTCTTNTAGACAGAGATGATCAAGGGACGTTNCTTCAAATCTNCACAAAACCACTAGG 

121 TGACAGGCCGACGNTATTTATAGAGATAATCCAGAGNGTAGGATGCATGATGAAAGATGT 

131 GGAAGGGANGGCTTACCAGAGTGGAGNATNTNGTGGTTTTGGCAAAGGCAATT 



WO 97/49816 



PCT/US97/11295 



2/6 

FIG. 2 

I TGAAATCAATGGGCCACCAAAACGCCGCCGTTTCAGAGAATCAAAACCATGATGACGGCG 

61 CTGCGTCGTCGCCGGGATTCAAGCTCGTCGGATTTTCCAAGTTCGTAAGAAAGAATCCAA 

121 AGTCTGAT AAATTCAAGGTTAAGCGCTTCCATCACATCGAGTTCTGGTGCGGGGACGCAA 

£Tco47III 

181 CCAACGTCGCTCGTCGCTTCTCCTGGGGTCTGGGGATGAGATTCTCCGCCAAATCCGATC 

2 41 TTTCCACCGGAAACATGGTTCACGCCTCTTACCTACTCACCTCCGGTGAACTCCGATTCC 
301 TTTTCACTGCTCCTTACTCTCCGTCTCTCTCCGGCGGAGAGATTAAACCGACAACCACAG 

3 61 GTTCTATCCCAAGTTTCGATCACGGGTCTTGTCGGTCCTTCTTCTCTTCACATGGTCTCG 

4 21 GTGTTAGACCCGTTGCGATTGAAGTAGAAGACGCGGAGTCAGCTTTCTCCATCAGTGTAG 

4 81 CTAATGGCGCTATTCCTTCGTCGCCTCCTATCGTCCTCAATGAAGCAGTTACGATCGCTG 

541 AGGTTAAACTATACGGCGATGTTGTTCTCCGATATGTTAGTTACAAAGCAGAAGATACCG 

601 AAAAATCCGAATTCTTGCCAGGGTTCGAGCGTGTAGAGGATGCGTCGTCGTTCCCATTGG 
EcoRl 

661 ATTATGGTATCCGGCGGCTTGACCACGCCGTGGGAAACGTTCCTGAGCTTGGTCCGGCTT 
721 TAACTTATGTAGCGGGGTTCACTGGTTTTCACCAATTCGCAGAGTTCACAGCAGACGACG 

7 81 TTGGAACCGCCGAGAGCGGTTTAAATTCAGCGGTCCTGGCTAGCAATGATGAAATGGTTC 

Nhel 

8 41 TTCTACCGATTAACGAGCCAGTGCACGGAACAAAGAGGAAGAGTCAGATTCAGACGTATT 
901 TGGAACATAACGAAGGCGCAGGGCTACAACATCTGGCTCTGATGAGTGAAGACATATTCA 
961 GGACCCTGAGAGAGATGAGGAAGAGGAGCAGTATTGGAGGATTCGACTTCATGCCTTCTC 
1021 CTCCGCCTACTTACTACCAGAATCTCAAGAAACGGGTCGGCGACGTGCTCAGCGATGATC 
1081 AGATCAAGGAGTGTGAGGAATTAGGGATTCTTGTAGACAGAGATGATCAAGGGACGTTGC 
1141 TTCAAATCTTCACAAAACCACTAGGTGACAGGCCGACGATATTTATAGAGATAATCCAGA 
1201 GAGTAGGATGCATGATGAAAGATGAGGAAGGGAAGGCTTACCAGAGTGGAGGATGTGGTG 
12 61 GTTTTGCCAAAGGCAATTTCTCTGAGCTCTTCAAGTCCATTGAAGAATACGAAAAGACTC 
1321 TTGAAGCCAAACAGTTAGTGGGATGAACAAGAAGAAGAACC AACTAAAGGATTGTGTAAT 
1381 TAATGT AAAACTGTTTTATCTTATCAAAACAATGTATACAACATCTCATTTAAAAACGAG 
14 41 ATCAATCC 
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FIG.3A 
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I EFWCGDATN 7ARRFSWGLG MRFSAKSDLS TGNMVHASYL LTSGDLRFLF 

7ELWCADAAS AAGRFSFGLG APLAARSDLS TGNSAHASLL LRSGSLS FLF 

VTFWVGNAKQ AASFYCNKMG FEPLAYKGLE TGSREVVSHV IKQGKI VFVL 

VTFWVGNAKQ AASFYCNKMG FEPLAYRGLE TGSREVVSHV I KRGKI VFVL 

VTFWVGNAKQ AASFYCSKMG FEPLAYRGLE TGSREVVSHV I KQGKI VFVL 

VTFWVGNAKQ AAS Y YCSKIG FEPLAYKGLE TGSREVVSHV VKQDKIVFVF 
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TAPYSPSLSA GEIKPTTTAS IPSFDHGSCR SFFSSHGLGV RAVAIEVEDA 

TAPYAHGADA ATAA LPSFSAAAAR RFAADHGLAV RAVALRVADA 

CSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 

CSALNPW NKEMG DHLVKHGDGV KDI AFEVEDC 

SSALNPW NKEMG DHLVKHGDGV KDIAFEVEDC 

SSALNPW NKEMG DHLVKHGDGV KDIAFEVEDC 



Arabidops is 
Corn 
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Mouse 
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ESAFSISVAN GAIPSSPPIV LNEAVTIAEV KLYGDVVLRY 

EDAFRASVAA GARPAFGPVD LGRGFRLAEV ELYGDVVLRY 

EHI VQKARER GAKIVREPWV EEDKFGKVKF AVLQTYGDTT 

DHIVQKARER GAKIVREPWV EQDKFGKVKF AVLQTYGDTT 

DYIVQKARER GAKIMREPWV EQDKFGKVKF AVLQTYGDTT 



Pig DYIVQKARER GAIIVREPWI EQDKFGKVKF AVLQTFGDTT 
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VSYKAEDTEK 
VSY. PDGAAG 
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SEFLPG FER . . .VEDASSFP 

EPFLPGFEG. . .V. . AS PGA ADYGLSRFDH 

GRFLPGFEAP TYKDTLLPKL PSCNLEIIDH 

GRFLPGFEAP TYKDTLLPKL PRCNLEI IDH 

GQFLPGYEPP AFMDPLLPKL PKCSLEMIDH 

GCFLPGFEAP TFTDPLLSKL PKCGLEI IDH 
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LDYGIRRLDH AVGNVP . . EL GPALTYVAGF 
I VGNVP . . EL APAAAYFAGF 
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TGFHQFAEFT ADDVGTAESG LNSAVLASND 

TGFHEFAEFT TEDVGTAESG LNSMVLANNS 

LQFHRFWSVD DTQVHTEYSS LRS I VVANYE 

LQFHRFWSVD DTQVHTEYSS LRS I VVTNYE 

LQFHRFWSVD DTQVHTEYSS LRS I VVANYE 

LQFHRFWSVD DTQIHTEYSA LRSVVMANYE 
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EMVLLPINEP VHGTKRKSQI 
ENVLLPLMEP VHGTKRRSQI 
ESIKMPINEP APG.RKKSQI 
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ESIKMPINEP APG.KKKSQI 
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FIG.3B 



301 

Arabidopsis QTYLEHNEGA GLQHLALMSE 

Corn QTFLDHHGGP GVQHMALASD 

Rat QEYVDYNGGA GVQHIALRTE 

Mouse QEYVDYNGGA GVQH IALKTE 

Human QEYVDYNGGA GVQH IALKTE 

Pig QEYVDYNGGA GVQH IALKTE 
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GVRR . . RAGD 
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IEIIQRVGCM MKDEEGKAYQ SGGCGGFGKG NFSELFKS IE EYEKTLEAKO 

™H^ GGM EKDEKG QEYQ KGGCGGFGKG NFSQLFKS IE DYEKSLEAKQ 

" n Q n Q GFGAG NFNSLFKAFE E . EQALRG 

LEVIoSK GFGAG NFNSLFKAFE E.EQALRGNL 

^EVlSSE GFGAG NFNSLFKAFE E . EQNLRGNL 

^EVIQRNNHQ GFGAG NFNSLFKAFE E.EQELRGNL 

*** *■* *- + * * * 
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